Re: [Python-ideas] LOAD_NAME/LOAD_GLOBAL should be use getattr()

2017-09-14 Thread C Anthony Risinger
On Thu, Sep 14, 2017 at 8:07 AM, Steven D'Aprano 
wrote:

> On Wed, Sep 13, 2017 at 12:24:31PM +0900, INADA Naoki wrote:
> > I'm worring about performance much.
> >
> > Dict has ma_version from Python 3.6 to be used for future optimization
> > including global caching.
> > Adding more abstraction layer may make it difficult.
>
> Can we make it opt-in, by replacing the module __dict__ when and only if
> needed? Perhaps we could replace it on the fly with a dict subclass that
> defines __missing__? That's virtually the same as __getattr__.
>
> Then modules which haven't replaced their __dict__ would not see any
> slow down at all.
>
> Does any of this make sense, or am I talking nonsense on stilts?
>

This is more or less what I was describing here:

https://mail.python.org/pipermail/python-ideas/2017-September/047034.html

I am also looking at Neil's approach this weekend though.

I would be happy with a __future__ that enacted whatever concessions are
necessary to define a module as if it were a class body, with import
statements maybe being implicitly global. This "new-style" module would
preferably avoid the need to populate `sys.modules` with something that
can't possibly exist yet (since it's being defined!). Maybe we allow module
bodies to contain a `return` or `yield`, making them a simple function or
generator? The presence of either would activate this "new-style" module
loading:

* Modules that call `return` should return the completed module. Importing
yourself indirectly would likely cause recursion or be an error (lazy
importing would really help here!). Could conceptually expand to something
like:

```
global __class__
global __self__

class __class__:
def __new__(... namespace-dunders-and-builtins-passed-as-kwds ...):
# ... module code ...
# ... closures may access __self__ and __class__ ...
return FancyModule(__name__)

__self__ = __class__(__builtins__={...}, __name__='fancy', ...)
sys.modules[__self__.__name__] = __self__
```

* Modules that call `yield` should yield modules. This could allow defining
zero modules, multiple modules, overwriting the same module multiple times.
Module-level code may then yield an initial object so self-referential
imports, in lieu of deferred loading, work better. They might decide to
later upgrade the initial module's __class__ (similar to today) or replace
outright. Could conceptually expand to something like:

```
global __class__
global __self__

def __hidden_TOS(... namespace-dunders-and-builtins-passed-as-kwds ...):
# ... initial module code ...
# ... closures may access __self__ and __class__ ...
module = yield FancyModuleInitialThatMightRaiseIfUsed(__name__)
# ... more module code ...
module.__class__ = FancyModule

for __self__ in __hidden_TOS(__builtins__={...}, __name__='fancy', ...):
__class__ = __self__.__class__
sys.modules[__self__.__name__] = __self__
```

Otherwise I still have a few ideas around using what we've got, possibly in
a backwards compatible way:

```
global __builtins__ = {...}
global __class__
global __self__

# Loader dunders.
__name__ = 'fancy'

# Deferred loading could likely stop this from raising in most cases.
# globals is a deferred import dict using __missing__.
# possibly sys.modules itself does deferred imports using __missing__.
sys.modules[__name__] = RaiseIfTouchedElseReplaceAllRefs(globals())

class __class__:
[global] import current_module # ref in cells replaced with __self__
[global] import other_module

def bound_module_function(...):
pass

[global] def simple_module_function(...):
pass

# ... end module body ...

# Likely still a descriptor.
__dict__ = globals()

__self__ = __class__()
sys.modules[__self__.__name__] = __self__
 ```

Something to think about.

Thanks,

-- 

C Anthony
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] LOAD_NAME/LOAD_GLOBAL should be use getattr()

2017-09-14 Thread Chris Angelico
On Fri, Sep 15, 2017 at 12:08 AM, Serhiy Storchaka  wrote:
> 13.09.17 23:07, Lucas Wiman пише:
>>
>> On Wed, Sep 13, 2017 at 11:55 AM, Serhiy Storchaka > > wrote:
>>
>> [...] Calling __getattr__() will slow down the access to builtins.
>> And there is a recursion problem if module's __getattr__() uses
>> builtins.
>>
>>
>>   The first point is totally valid, but the recursion problem doesn't seem
>> like a strong argument. There are already lots of recursion problems when
>> defining custom __getattr__ or __getattribute__ methods, but on balance
>> they're a very useful part of the language.
>
>
> In normal classes we have the recursion problem in __getattr__() only with
> accessing instance attributes. Builtins (like isinstance, getattr,
> AttributeError) can be used without problems. In module's __getattr__() all
> this is a problem.
>
> Module attribute access can be implicit. For example comparing a string with
> a byte object in __getattr__() can trigger the lookup of __warningregistry__
> and the infinity recursion.

Crazy idea: Can we just isolate that function from its module?

def isolate(func):
return type(func)(func.__code__, {"__builtins__": __builtins__},
func.__name__)

@isolate
def __getattr__(name):
print("Looking up", name)
# the lookup of 'print' will skip this module

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] LOAD_NAME/LOAD_GLOBAL should be use getattr()

2017-09-14 Thread Serhiy Storchaka

13.09.17 23:07, Lucas Wiman пише:
On Wed, Sep 13, 2017 at 11:55 AM, Serhiy Storchaka 
> wrote:


[...] Calling __getattr__() will slow down the access to builtins.
And there is a recursion problem if module's __getattr__() uses
builtins.


  The first point is totally valid, but the recursion problem doesn't 
seem like a strong argument. There are already lots of recursion 
problems when defining custom __getattr__ or __getattribute__ methods, 
but on balance they're a very useful part of the language.


In normal classes we have the recursion problem in __getattr__() only 
with accessing instance attributes. Builtins (like isinstance, getattr, 
AttributeError) can be used without problems. In module's __getattr__() 
all this is a problem.


Module attribute access can be implicit. For example comparing a string 
with a byte object in __getattr__() can trigger the lookup of 
__warningregistry__ and the infinity recursion.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] LOAD_NAME/LOAD_GLOBAL should be use getattr()

2017-09-14 Thread Steven D'Aprano
On Wed, Sep 13, 2017 at 12:24:31PM +0900, INADA Naoki wrote:
> I'm worring about performance much.
> 
> Dict has ma_version from Python 3.6 to be used for future optimization
> including global caching.
> Adding more abstraction layer may make it difficult.

Can we make it opt-in, by replacing the module __dict__ when and only if 
needed? Perhaps we could replace it on the fly with a dict subclass that 
defines __missing__? That's virtually the same as __getattr__.

Then modules which haven't replaced their __dict__ would not see any 
slow down at all.

Does any of this make sense, or am I talking nonsense on stilts?




-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] LOAD_NAME/LOAD_GLOBAL should be use getattr()

2017-09-13 Thread Lucas Wiman
On Wed, Sep 13, 2017 at 11:55 AM, Serhiy Storchaka 
wrote:

> [...] Calling __getattr__() will slow down the access to builtins. And
> there is a recursion problem if module's __getattr__() uses builtins.
>

 The first point is totally valid, but the recursion problem doesn't seem
like a strong argument. There are already lots of recursion problems when
defining custom __getattr__ or __getattribute__ methods, but on balance
they're a very useful part of the language.

- Lucas
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] LOAD_NAME/LOAD_GLOBAL should be use getattr()

2017-09-13 Thread Serhiy Storchaka

12.09.17 19:17, Neil Schemenauer пише:

This is my idea of making module properties work.  It is necessary
for various lazy-loading module ideas and it cleans up the language
IMHO.  I think it may be possible to do it with minimal backwards
compatibility problems and performance regression.

To me, the main issue with module properties (or module __getattr__)
is that you introduce another level of indirection on global
variable access.  Anywhere the module.__dict__ is used as the
globals for code execution, changing LOAD_NAME/LOAD_GLOBAL to have
another level of indirection is necessary.  That seems inescapable.

Introducing another special feature of modules to make this work is
not the solution, IMHO.  We should make module namespaces be more
like instance namespaces.  We already have a mechanism and it is
getattr on objects.


There is a difference between module namespaces and instance namespaces. 
LOAD_NAME/LOAD_GLOBAL fall back to builtins if the name is not found in 
the globals dictionary. Calling __getattr__() will slow down the access 
to builtins. And there is a recursion problem if module's __getattr__() 
uses builtins.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] LOAD_NAME/LOAD_GLOBAL should be use getattr()

2017-09-12 Thread Nick Coghlan
On 13 September 2017 at 02:17, Neil Schemenauer
 wrote:
> Introducing another special feature of modules to make this work is
> not the solution, IMHO.  We should make module namespaces be more
> like instance namespaces.  We already have a mechanism and it is
> getattr on objects.

One thing to keep in mind is that class instances *also* allow their
attribute access machinery to be bypassed by writing to the
instance.__dict__ directly - it's just that the instance dict may be
bypassed on lookup for data descriptors.

So that means we wouldn't need to change the way globals() works -
we'd just add the caveat that amendments made that way may be ignored
for things defined as properties.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] LOAD_NAME/LOAD_GLOBAL should be use getattr()

2017-09-12 Thread INADA Naoki
I'm worring about performance much.

Dict has ma_version from Python 3.6 to be used for future optimization
including global caching.
Adding more abstraction layer may make it difficult.


When considering lazy loading, big problem is backward compatibility.
For example, see
https://github.com/python/cpython/blob/master/Lib/concurrent/futures/__init__.py

from concurrent.futures._base import (FIRST_COMPLETED,
  FIRST_EXCEPTION,
  ALL_COMPLETED,
  CancelledError,
  TimeoutError,
  Future,
  Executor,
  wait,
  as_completed)
from concurrent.futures.process import ProcessPoolExecutor
from concurrent.futures.thread import ThreadPoolExecutor


Asyncio must import concurrent.futures.Future because compatibility between
asyncio.Future and concurrent.futures.Future.

But not all asyncio applications need ProcessPoolExecutor.
Thay may use only ThreadPoolExecutor.

Currently, they are forced to import concurrent.futures.process, and it imports
multiprocessing.  It makes large import dependency tree.

To solve such problem, hooking LOAD_GLOBAL is not necessary.

# in concurrent/futures/__init__.py

def __getattr__(name):
if name == 'ProcessPoolExecutor':
global ProcessPoolExecutor
from .process import ProcessPoolExecutor
return ProcessPoolExecutor

# Following code should call __getattr__

from concurrent.futures import ProcessPoolExecutor  # eager loading

import concurrent.futures as futures
executor = futures.ProcessPoolExecutor()  # lazy loading


On the other hand, lazy loading global is easier than above.
For example, linecache imports tokenize and tokenize is relatively heavy.
https://github.com/python/cpython/blob/master/Lib/linecache.py#L11

tokenize is used from only one place (in linecache.updatecache()).
So lazy importing it is just moving `import tokenize` into the function.

try:
import tokenize
with tokenize.open(fullname) as fp:
lines = fp.readlines()

I want to lazy load only for heavy and rarely used module
Lazy loading many module may make execution order unpredictable.
So manual lazy loading technique is almost enough to me.


Then, what is real world requirement about abstraction layer to LOAD_GLOBAL?

Regards,
INADA Naoki  


On Wed, Sep 13, 2017 at 1:17 AM, Neil Schemenauer
 wrote:
> This is my idea of making module properties work.  It is necessary
> for various lazy-loading module ideas and it cleans up the language
> IMHO.  I think it may be possible to do it with minimal backwards
> compatibility problems and performance regression.
>
> To me, the main issue with module properties (or module __getattr__)
> is that you introduce another level of indirection on global
> variable access.  Anywhere the module.__dict__ is used as the
> globals for code execution, changing LOAD_NAME/LOAD_GLOBAL to have
> another level of indirection is necessary.  That seems inescapable.
>
> Introducing another special feature of modules to make this work is
> not the solution, IMHO.  We should make module namespaces be more
> like instance namespaces.  We already have a mechanism and it is
> getattr on objects.
>
> I have a very early prototype of this idea.  See:
>
> https://github.com/nascheme/cpython/tree/exec_mod
>
> Issues to be resolved:
>
> - __namespace__ entry in the __dict__ creates a reference cycle.
>   Maybe could use a weakref somehow to avoid it.  Maybe we just
>   explicitly break it.
>
> - getattr() on the module may return things that LOAD_NAME and
>   LOAD_GLOBAL don't expect (e.g. things from the module type).  I
>   need to investigate that.
>
> - Need to fix STORE_* opcodes to do setattr() rather than
>   __setitem__.
>
> - Need to optimize the implementation.  Maybe the module instance
>   can know if any properties or __getattr__ are defined.  If no,
>   have __getattribute__ grab the variable directly from md_dict.
>
> - Need to fix eval() to allow module as well as dict.
>
> - Need to change logic where global dict is passed around.  Pass the
>   module instead so we don't have to keep retrieving __namespace__.
>   For backwards compatibility, need to keep functions that take
>   'globals' as dict and use PyModule_GetDict() on public APIs that
>   return globals as a dict.
>
> - interp->builtins should be a module, not a dict.
>
> - module shutdown procedure needs to be investigated and fixed.  I
>   think it may get simpler.
>
> - importlib needs to be fixed to pass modules to exec() and not
>   dicts.  From my initial experiments, it looks like importlib gets
>   a lot simpler.  Right now we pass around dicts in a lot of places
>   and then have to grub around in sys.modules to get the 

Re: [Python-ideas] LOAD_NAME/LOAD_GLOBAL should be use getattr()

2017-09-12 Thread Neil Schemenauer
On 2017-09-12, Eric Snow wrote:
> Yeah, good luck! :). If I weren't otherwise occupied with my own crazy
> endeavor I'd lend a hand.

No problem.  It makes sense to have a proof of concept before
spending time on a PEP.  If the idea breaks too much old code it is
not going to happen.  So, I will work on a slow but mostly
compatible implementation for now.

Regards,

  Neil
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] LOAD_NAME/LOAD_GLOBAL should be use getattr()

2017-09-12 Thread Eric Snow
On Sep 12, 2017 10:17 AM, "Neil Schemenauer" 
wrote:

Introducing another special feature of modules to make this work is
not the solution, IMHO.  We should make module namespaces be more
like instance namespaces.  We already have a mechanism and it is
getattr on objects.


+1

- importlib needs to be fixed to pass modules to exec() and not
  dicts.  From my initial experiments, it looks like importlib gets
  a lot simpler.  Right now we pass around dicts in a lot of places
  and then have to grub around in sys.modules to get the module
  object, which is what importlib usually wants.


Without looking at the importlib code, passing around modules should mostly
be fine.  There is some semantic trickiness involving sys.modules, but it
shouldn't be too bad to work around.

I have requested help in writing a PEP for this idea but so far no
one is foolish enough to join my crazy endeavor. ;-)


Yeah, good luck! :). If I weren't otherwise occupied with my own crazy
endeavor I'd lend a hand.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] LOAD_NAME/LOAD_GLOBAL should be use getattr()

2017-09-12 Thread Neil Schemenauer
This is my idea of making module properties work.  It is necessary
for various lazy-loading module ideas and it cleans up the language
IMHO.  I think it may be possible to do it with minimal backwards
compatibility problems and performance regression.

To me, the main issue with module properties (or module __getattr__)
is that you introduce another level of indirection on global
variable access.  Anywhere the module.__dict__ is used as the
globals for code execution, changing LOAD_NAME/LOAD_GLOBAL to have
another level of indirection is necessary.  That seems inescapable.

Introducing another special feature of modules to make this work is
not the solution, IMHO.  We should make module namespaces be more
like instance namespaces.  We already have a mechanism and it is
getattr on objects.

I have a very early prototype of this idea.  See:

https://github.com/nascheme/cpython/tree/exec_mod

Issues to be resolved:

- __namespace__ entry in the __dict__ creates a reference cycle.
  Maybe could use a weakref somehow to avoid it.  Maybe we just
  explicitly break it.

- getattr() on the module may return things that LOAD_NAME and
  LOAD_GLOBAL don't expect (e.g. things from the module type).  I
  need to investigate that.

- Need to fix STORE_* opcodes to do setattr() rather than
  __setitem__.

- Need to optimize the implementation.  Maybe the module instance
  can know if any properties or __getattr__ are defined.  If no,
  have __getattribute__ grab the variable directly from md_dict.

- Need to fix eval() to allow module as well as dict.

- Need to change logic where global dict is passed around.  Pass the
  module instead so we don't have to keep retrieving __namespace__.
  For backwards compatibility, need to keep functions that take
  'globals' as dict and use PyModule_GetDict() on public APIs that
  return globals as a dict.

- interp->builtins should be a module, not a dict.

- module shutdown procedure needs to be investigated and fixed.  I
  think it may get simpler.

- importlib needs to be fixed to pass modules to exec() and not
  dicts.  From my initial experiments, it looks like importlib gets
  a lot simpler.  Right now we pass around dicts in a lot of places
  and then have to grub around in sys.modules to get the module
  object, which is what importlib usually wants.

I have requested help in writing a PEP for this idea but so far no
one is foolish enough to join my crazy endeavor. ;-)

Regards,

  Neil
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/