from:"Nathaniel Smith"

Re: [Python-Dev] Surely "nullable" is a reasonable name?

2014-08-04 Thread Nathaniel Smith

I admit I spent the first half of the email scratching my head and trying
to figure out what NULL had to do with argument clinic specs. (Maybe it
would mean that if the argument is "not given" in some appropriate way then
we set the corresponding C variable to NULL?) Finding out you were talking
about None came as a surprising twist.

-n
On 4 Aug 2014 08:13, "Larry Hastings"  wrote:

>
>
> Argument Clinic "converters" specify how to convert an individual argument
> to the function you're defining.  Although a converter could theoretically
> represent any sort of conversion, most of the time they directly represent
> types like "int" or "double" or "str".
>
> Because there's such variety in argument parsing, the converters are
> customizable with parameters.  Many of these are common enough that
> Argument Clinic suggests some standard names.  Examples: "zeroes=True" for
> strings and buffers means "permit internal \0 characters", and
> "bitwise=True" for unsigned integers means "copy the bits over, even if
> there's overflow/underflow, and even if the original is negative".
>
> A third example is "nullable=True", which means "also accept None for this
> parameter".  This was originally intended for use with strings (compare the
> "s" and "z" format units for PyArg_ParseTuple), however it looks like we'll
> have a use for "nullable ints" in the ongoing Argument Clinic conversion
> work.
>
> Several people have said they found the name "nullable" surprising,
> suggesting I use another name like "allow_none" or "noneable".  I, in turn,
> find their surprise surprising; "nullable" is a term long associated with
> exactly this concept.  It's used in C# and SQL, and the term even has its
> own Wikipedia page:
>
> http://en.wikipedia.org/wiki/Nullable_type
>
> Most amusingly, Vala *used* to have an annotation called "(allow-none)",
> but they've broken it out into two annotations, "(nullable)" and
> "(optional)".
>
>
> http://blogs.gnome.org/desrt/2014/05/27/allow-none-is-dead-long-live-nullable/
>
>
> Before you say "the term 'nullable' will confuse end users", let me remind
> you: this is not user-facing.  This is a parameter for an Argument Clinic
> converter, and will only ever be seen by CPython core developers.  A group
> which I hope is not so easily confused.
>
> It's my contention that "nullable" is the correct name.  But I've been
> asked to bring up the topic for discussion, to see if a consensus forms
> around this or around some other name.
>
> Let the bike-shedding begin,
>
>
> */arry*
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
>
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-09 Thread Nathaniel Smith

On Fri, Oct 10, 2014 at 1:29 AM, Victor Stinner
 wrote:
> Hi,
>
> Windows is not the primary target of Python developers, probably
> because most of them work on Linux. Official Python binaries are
> currently built by Microsoft Visual Studio. Even if Python developers
> get free licenses thanks for Microsoft, I would prefer to use an open
> source compiler if it would be possible. So *anyone* can build Python
> from scatch. I don't like the requirement of having a license to build
> Python. The free version (Visual Studio Express) only supports 32-bit
> and doesn't support PGO build (Profile-Guided Optimizations, which are
> disabled if I remember correctly because of compiler bugs).
>
> I know that it's hard to replace Visual Studio. I don't want to do it
> right now, but I would like to discuss that with you.
>
>
> === Open Watcom
>
> Jeffrey Armstrong is working on the Python support of OpenWatcom(v2), see:
> http://lightningpython.org/
> https://bitbucket.org/ArmstrongJ/lightning-python
>
> This compiler was initially written on MS-DOS in 32-bit, but it now
> supports Windows and Linux as well. The 64-bit mode is new and
> experimental. The Open Watcom "v2" project is actively developed at:
>
> https://github.com/open-watcom/open-watcom-v2/
>
> On Linux, Open Watcom don't support dynamic linking. On Windows, it
> uses its own C library. I'm not sure that Open Watcom is the best
> choice to build Python on Windows.
>
>
> === MinGW
>
> Some people tried to compile Python. See for example:
> https://bitbucket.org/puqing/python-mingw
>
> We even got some patches:
> http://bugs.python.org/issue3871 (rejected)
>
> See also:
> https://stackoverflow.com/questions/15365249/build-python-with-mingw-and-gcc
>
> MinGW reuses the Microsoft C library and it is based on GCC which is
> very stable, actively developed, supports a lot of archiectures, etc.
> I guess that it should be possible to reuse third party GCC tools like
> the famous GDB debugger?

You may want to get in touch with Carl Kleffner -- he's done a bunch
of work lately on getting a mingw-based toolchain to the point where
it can build numpy and scipy. (This is pretty urgent for us because
(a) numerical work requires a BLAS library and the main competitive
open-source one -- OpenBLAS -- cannot be built by msvc because of asm
syntax issues, (b) msvc's fortran support is even worse than its C99
support.) Getting this working is non-trivial, since by default
mingw-compiled code depends on the GCC runtime libraries, the default
ABI doesn't match msvc, etc. But apparently these issues are all
fixable.

General info:
  https://github.com/numpy/numpy/wiki/Mingw-static-toolchain

The built toolchains etc.:
  https://bitbucket.org/carlkl/mingw-w64-for-python/downloads

Readme:
  https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/readme.txt

The patch to the numpy sources -- this in particular includes the
various distutils hacks needed to enable the crucial ABI-compatibility
switches:
  https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/numpy.patch

(Unfortunately he doesn't seem to have posted the build recipe for the
toolchain itself -- I'm sure he'd be happy to if you asked though.)

AFAICT the end result is a single free compiler toolchain that can
spit out 32- and 64-bit binaries using whichever MSVC runtime you
prefer.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-11 Thread Nathaniel Smith

On 11 Oct 2014 14:42, "Antoine Pitrou"  wrote:
>
> On Sat, 11 Oct 2014 00:30:51 + (UTC)
> Sturla Molden  wrote:
> > Larry Hastings  wrote:
> >
> > > CPython doesn't require OpenBLAS.  Not that I am not receptive to the
> > > needs of the numeric community... but, on the other hand, who in the
> > > hell releases a library with Windows support that doesn't work with
MSVC?!
> >
> > It uses AT&T assembly syntax instead of Intel assembly syntax.
>
> But you can compile OpenBLAS with one compiler and then link it to
> Python using another compiler, right? There is a single C ABI.

In theory, yes, but this is pretty misleading. The theory has been known
for years. In practice we've only managed to pull this off for the first
time within the last few months, and it requires one specific build of one
specific mingw fork with one specific set of build options, and only one
person understands the details. Hopefully all those things will continue to
be available and there aren't any showstopper bugs we haven't noticed yet...

(Before that we've spent the last 5+ years using a carefully preserved
build of an ancient 32 bit mingw and substandard BLAS .dll that were handed
down from our ancestors; we've had no capability to produce 64 bit official
builds at all. And a large proportion of users have been using 3rd party
proprietary builds.)

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-11 Thread Nathaniel Smith

I'm not at all an expert on Fortran ABIs, but I think there are two
distinct issues being conflated here.

The first is that there is no standard way to look at some Fortran source
code and figure out the corresponding C API. When trying to call a Fortran
routine from C, then different Fortran compilers require different sorts of
name mangling, different ways of mapping Fortran concepts like "output
arguments" onto C concepts like "pointers", etc., so your C code needs to
have explicit knowledge of which Fortran compiler is in use. That's what
Sturla was referring to with the differences between g77 versus gfortran
etc. This is all very annoying, but it isn't a *deep* bad - it can be
solved by unilateral action by whatever project wants to link the Fortran
library.

The bigger problem is that getting a usable DLL at all is a serious
challenge. Some of the issues we deal with: (a) the classic, stable mingw
has no 64-bit support, (b) the only portable way to compile fortran (f2c)
only works for the ancient fortran 77, (c) getting even mingw-w64 to use a
msvc-compatible ABI is not trivial (I have no idea why this is the case,
but it is), (d)
mingw-built
dlls normally depend on the mingw runtime dlls. Because these aren't
shipped globally with Python, they have to be either linked statically or
else a separate copy of them has to be placed into every directory that
contains any mingw-compiled extension module.

All the runtime and ABI issues do mean that it would be much easier to use
mingw(-w64) to build extension modules if Python itself were built with
mingw(-w64). Obviously this would in turn make it harder to build
extensions with MSVC, though, which would be a huge transition. I don't
know whether gcc's advantages (support for more modern C, better
cross-platform compatibility, better accessibility to non-windows-experts,
etc.) would outweigh the transition and other costs.

As an intermediate step, there are almost certainly things that could be
done to make it easier to use mingw-w64 to build python extensions, e.g.
teaching setuptools about how to handle the ABI issues. Maybe it would even
be possible to ship the mingw runtimes in some globally available location.

-n
On 11 Oct 2014 17:07, "Steve Dower"  wrote:

>   Is there some reason the Fortran part can't be separated out into a
> DLL? That's the C ABI Antoine was referring to, and most compilers can
> generate import libraries from binaries, even if the original compiler
> produced then in a different format.
>
> Top-posted from my Windows Phone
>  --
> From: Sturla Molden 
> Sent: ‎10/‎11/‎2014 7:22
> To: [email protected]
> Subject: Re: [Python-Dev] Status of C compilers for Python on Windows
>
>   Antoine Pitrou  wrote:
>
> > It sound like whatever MSVC produces should be the defacto standard
> > under Windows.
>
> Yes, and that is what Clang does on Windows. It is not as usable as MinGW
> yet, but soon it will be. Clang also suffers fronthe lack of a Fortran
> compiler, though.
>
> Sturla
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/steve.dower%40microsoft.com
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
>
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-27 Thread Nathaniel Smith

On Mon, Oct 27, 2014 at 5:48 PM, Paul Moore  wrote:
> On 26 October 2014 23:44, Paul Moore  wrote:
>> On 26 October 2014 23:11, Ray Donnelly  wrote:
>>> I don't know where this "ABI compatible" thing came into being;
>>
>> Simple. If a mingw-built CPython doesn't work with the same extensions
>> as a MSVC-built CPython, then the community gets fragmented (because
>> you can only use the extensions built for your stack). Assuming numpy
>> needs mingw and ultimately only gets built for a mingw-compiled Python
>> (because the issues building for MSVC-built Python are too hard) and
>> assuming that nobody wants to make the effort to build pywin32 under
>> mingw, then what does someone who needs both numpy and pywin32 do?
>>
>> Avoiding that issue is what I mean by ABI-compatible. (And that's all
>> I mean by it, nothing more subtle or controversial).
>>
>> I view it as critical (because availability of binaries is *already*
>> enough of a problem in the Windows world, without making it worse)
>> that we avoid this sort of fragmentation. I'm not seeing an
>> acknowledgement from the mingw side that they agree. That's my
>> concern. If we both agree, there's nothing to argue about.
>
> I have just done some experiments with building CPython extensions
> with mingw-w64. Thanks to Ray for helping me set this up.
>
> The bad news is that the support added to the old 32-bit mingw to
> support linking to alternative C runtime libraries (specifically
> -lmsvcr100) has bitrotted, and no longer functions correctly in
> mingw-w64. As a result, not only can mingw-w64 not build extensions
> that are compatible with python.org Python, it can't build extensions
> that function at all [1]. They link incompatibly to *both* msvcrt and
> msvcr100.
>
> This is a bug in mingw-w64. I have reported it to Ray, who's passed it
> onto one of the mingw-w64 developers. But as things stand, mingw
> builds will definitely produce binary extensions that aren't
> compatible with python.org Python.

IIUC, getting mingw-w64 to link against msvcr100 instead of msvcrt
requires a custom mingw-w64 build, because by default mingw-w64's
internal runtime libraries (libgcc etc.) are linked against msvcrt. So
by the time you're choosing compiler switches etc., it's already too
late -- your switches might affect how *your* code is built, but your
code will still be linked against pre-existing runtime libraries that
are linked against msvcrt.

It's possible to hack the mingw-w64 build process to build the runtime
libraries against msvcr100 (or whatever) instead of msvcrt, but this
is still not a panacea -- the different msvcr* libraries are, of
course, incompatible with each other, and IIUC the mingw-w64
developers have never tried to make their libraries work against
anything except msvcrt. For example, mingw-w64's gfortran runtime uses
a symbol that's only available in msvcrt, not msvcr90 or msvcrt100:
  http://sourceforge.net/p/mingw-w64/mailman/message/31768118/

So my impression is that these issues are all fixable, but they will
require real engagement with mingw-w64 upstream.

> [1] Note, that's if you just use --compiler=mingw32 as supported by
> distutils. Looking at how the numpy folks build, they seem to hack
> their own version of the distutils C compiler classes. I don't know
> whether that's just to work around this bug, or whether they do it for
> other reasons as well (but I suspect the latter).

numpy.distutils is a massive pile of hacks to handle all kinds of
weird things including recursive builds, fortran, runtime capability
detection (like autoconf), and every random issue anyone ran into at
some point in the last 10 years and couldn't be bothered filing a
proper upstream bug report. Basically no-one knows what it actually
does -- the source is your only hope :-).

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-29 Thread Nathaniel Smith

On 29 Oct 2014 14:47, "Antoine Pitrou"  wrote:
>
> On Wed, 29 Oct 2014 10:31:50 -0400
> "R. David Murray"  wrote:
>
> > On Wed, 29 Oct 2014 10:22:14 -0400, Tres Seaver 
wrote:
> > > On 10/28/2014 11:59 PM, Stephen J. Turnbull wrote:
> > >
> > > > most developers on Windows do have access to Microsoft tool
> > >
> > > I assume you mean python-dev folks who work on Windows:  it is
certainly
> > > not true for the vast majority of develoeprs who use Python on
Windows,
> > > who don't have the toolchain build their own C extensions.
> >
> > I'm pretty sure he meant "most people who develop software for Windows",
> > even though that's not how he phrased it.  But this does not include, as
> > you point out, people who develop Python software that *also* works on
> > Windows.
> >
> > If you are writing code targeted for Windows, I think you are very
> > likely to have an MSDN subscription of some sort if your package
includes C
> > code.  I'm sure it's not 100%, though.
>
> You can use Express editions of Visual Studio.

IIUC, the express edition compilers are 32-bit only, and what you actually
want are the "SDK compilers":
https://github.com/cython/cython/wiki/64BitCythonExtensionsOnWindows

These are freely downloadable by anyone, no msdn subscription required, but
only if you know where to find them!

AFAICT the main obstacle to using MSVC to build python extensions (assuming
it can handle your code at all) is not anything technical, but rather that
there's no clear and correct tutorial on how to do it, and lots of
confusion and misinformation circulating.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-29 Thread Nathaniel Smith

On Wed, Oct 29, 2014 at 10:46 PM, Paul Moore  wrote:
> On 29 October 2014 22:19, Ethan Furman  wrote:
>>> Yeah, I wondered about that. I'll work up a patch for that. But the
>>> more I think about it, it really is trivial:
>>
>> I am reminded of an interview question I was once asked which was prefaced
>> with: "Here's an easy one..."
>>
>> My reply was, if you know the answer, it's easy!
>
> Yeah, I know what you mean. My take on this is that I agree it's not
> easy if you don't know and can't get access to the information, but if
> you can, there's very little to it.

That's great, but yeah. In case it helps as a data point, I consider
myself a reasonably technical guy, probably more than your average
python package author -- 15 years of free software hacking, designed a
pretty popular X remote display protocol, helped invent DVCS, have
written patches to gcc and the kernel, numpy co-maintainer, etc. etc.

For the last ~12 months I've had an underlined and circled todo item
saying "make wheels for https://pypi.python.org/pypi/zs/";, and every
time I've sat down and spent a few hours trying I've ended up utterly
defeated with a headache. Distutils is spaghetti, and Windows is
spaghetti (esp. if you've never used it for more than checking email
on a friend's laptop), and the toolchain setup is spaghetti, and then
suddenly people are talking about vcvarsall.bats and I don't know what
all. I honestly would totally benefit from one of those
talk-your-grandparents-through-it tutorials ("here's a screenshot of
the dialogue box, and I've drawn an arrow pointing at the 'ok' button.
You should press the 'ok' button.")

>>> - For non-free MSVC, install the appropriate version, and everything just
>>> works.
>>> - For Python 2.7 (32 or 64 bit), install the compiler for Python 2.7
>>> package and everything just works as long as you're using setuptools.
>>> - For 32 bit Python 3.2-3.4, install Visual Studio Express and
>>> everything just works.
>>> - For 64 bit Python 3.2-3.4, install the SDK, set some environment
>>> variables, and everything just works.

I think the SDK covers both 32 and 64 bit? If so then leaving visual
studio out of things entirely is probably simpler.

I'm also unsure how you even get both 32- and 64-bit versions of
python to coexist on the same system.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Dinamically set call method

2014-11-04 Thread Nathaniel Smith

On Tue, Nov 4, 2014 at 4:52 PM, Roberto Martínez
 wrote:
> Hi folks,
>
> I am trying to replace dinamically the __call__ method of an object using
> setattr.
> Example:
>
> $ cat testcall.py
> class A:
> def __init__(self):
> setattr(self, '__call__', self.newcall)
>
> def __call__(self):
> print("OLD")
>
> def newcall(self):
> print("NEW")
>
> a=A()
> a()
>
> I expect to get "NEW" instead of "OLD", but in Python 3.4 I get "OLD".
>
> $ python2.7 testcall.py
> NEW
> $ python3.4 testcall.py
> OLD
>
> I have a few questions:
>
> - Is this an expected behavior?
> - Is possible to replace __call__ dinamically in Python 3? How?

For new-style classes, special methods like __call__ are looked up
directly on the class, not on the object itself. If you want to change
the result of doing a(), then you need to reassign A.__call__, not
a.__call__.

In python 3, all classes are new-style classes. In python 2, only
classes that inherit from 'object' are new-style classes. (If you
replace 'class A:' with 'class A(object):' then you'll see the same
behaviour on both py2 and py3.)

Easy workaround:

def __call__(self, *args, **kwargs):
return self._my_call(*args, **kwargs)

Now you can assign a._my_call to be whatever you want.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] OneGet provider for Python

2014-11-15 Thread Nathaniel Smith

On 15 Nov 2014 10:10, "Paul Moore"  wrote:
>
> > Incidentally, it would be really useful if python.org provided stable
> > url's that always redirected to the latest .msi installers, for
> > bootstrapping purposes. I'd prefer to not rely on chocolatey (or on
> > scraping the web site) for this.
>
> https://www.python.org/ftp/python/$ver/python-$ver.msi
> https://www.python.org/ftp/python/$ver/python-$ver.amd64.msi

Right, but what's the URL for "the latest 2.7.x release" or "the latest
3.x.x release"?

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-28 Thread Nathaniel Smith

Hi all,

There was some discussion on python-ideas last month about how to make
it easier/more reliable for a module to override attribute access.
This is useful for things like autoloading submodules (accessing
'foo.bar' triggers the import of 'bar'), or for deprecating module
attributes that aren't functions. (Accessing 'foo.bar' emits a
DeprecationWarning, "the bar attribute will be removed soon".) Python
has had some basic support for this for a long time -- if a module
overwrites its entry in sys.modules[__name__], then the object that's
placed there will be returned by 'import'. This allows one to define
custom subclasses of module and use them instead of the default,
similar to how metaclasses allow one to use custom subclasses of
'type'.

In practice though it's very difficult to make this work safely and
correctly for a top-level package. The main problem is that when you
create a new object to stick into sys.modules, this necessarily means
creating a new namespace dict. And now you have a mess, because now
you have two dicts: new_module.__dict__ which is the namespace you
export, and old_module.__dict__, which is the globals() for the code
that's trying to define the module namespace. Keeping these in sync is
extremely error-prone -- consider what happens, e.g., when your
package __init__.py wants to import submodules which then recursively
import the top-level package -- so it's difficult to justify for the
kind of large packages that might be worried about deprecating entries
in their top-level namespace. So what we'd really like is a way to
somehow end up with an object that (a) has the same __dict__ as the
original module, but (b) is of our own custom module subclass. If we
can do this then metamodules will become safe and easy to write
correctly.

(There's a little demo of working metamodules here:
   https://github.com/njsmith/metamodule/
but it uses ctypes hacks that depend on non-stable parts of the
CPython ABI, so it's not a long-term solution.)

I've now spent some time trying to hack this capability into CPython
and I've made a list of the possible options I can think of to fix
this. I'm writing to python-dev because none of them are obviously The
Right Way so I'd like to get some opinions/ruling/whatever on which
approach to follow up on.

Option 1: Make it possible to change the type of a module object
in-place, so that we can write something like

   sys.modules[__name__].__class__ = MyModuleSubclass

Option 1 downside: The invariants required to make __class__
assignment safe are complicated, and only implemented for
heap-allocated type objects. PyModule_Type is not heap-allocated, so
making this work would require lots of delicate surgery to
typeobject.c. I'd rather not go down that rabbit-hole.



Option 2: Make PyModule_Type into a heap type allocated at interpreter
startup, so that the above just works.

Option 2 downside: PyModule_Type is exposed as a statically-allocated
global symbol, so doing this would involve breaking the stable ABI.



Option 3: Make it legal to assign to the __dict__ attribute of a
module object, so that we can write something like

   new_module = MyModuleSubclass(...)
   new_module.__dict__ = sys.modules[__name__].__dict__
   sys.modules[__name__].__dict__ = {} # ***
   sys.modules[__name__] = new_module

The line marked *** is necessary because the way modules are designed,
they expect to control the lifecycle of their __dict__. When the
module object is initialized, it fills in a bunch of stuff in the
__dict__. When the module object (not the dict object!) is
deallocated, it deletes everything from the __dict__. This latter
feature in particular means that having two module objects sharing the
same __dict__ is bad news.

Option 3 downside: The paragraph above. Also, there's stuff inside the
module struct besides just the __dict__, and more stuff has appeared
there over time.



Option 4: Add a new function sys.swap_module_internals, which takes
two module objects and swaps their __dict__ and other attributes. By
making the operation a swap instead of an assignment, we avoid the
lifecycle pitfalls from Option 3. By making it a builtin, we can make
sure it always handles all the module fields that matter, not just
__dict__. Usage:

   new_module = MyModuleSubclass(...)
   sys.swap_module_internals(new_module, sys.modules[__name__])
   sys.modules[__name__] = new_module

Option 4 downside: Obviously a hack.



Option 3 or 4 both seem workable, it just depends on which way we
prefer to hold our nose. Option 4 is slightly more correct in that it
works for *all* modules, but OTOH at the moment the only time Option 3
*really* fails is for compiled modules with PEP 3121 metadata, and
compiled modules can already use a module subclass via other means
(since they instantiate their own module objects).

Thoughts? Suggestions on other options I've missed? Should I go ahead
and write a patch for one of these?

-n

-- 
Nathaniel J. Smith
Postd

Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-29 Thread Nathaniel Smith

On Sat, Nov 29, 2014 at 4:21 AM, Guido van Rossum  wrote:
> Are these really all our options? All of them sound like hacks, none of them
> sound like anything the language (or even the CPython implementation) should
> sanction. Have I missed the discussion where the use cases and constraints
> were analyzed and all other approaches were rejected? (I might have some
> half-baked ideas, but I feel I should read up on the past discussion first,
> and they are probably more fit for python-ideas than for python-dev. Plus
> I'm just writing this email because I'm procrastinating on the type hinting
> PEP. :-)

The previous discussions I was referring to are here:
  http://thread.gmane.org/gmane.comp.python.ideas/29487/focus=29555
  http://thread.gmane.org/gmane.comp.python.ideas/29788

There might well be other options; these are just the best ones I
could think of :-). The constraints are pretty tight, though:
- The "new module" object (whatever it is) should have a __dict__ that
aliases the original module globals(). I can elaborate on this if my
original email wasn't enough, but hopefully it's obvious that making
two copies of the same namespace and then trying to keep them in sync
at the very least smells bad :-).
- The "new module" object has to be a subtype of ModuleType, b/c there
are lots of places that do isinstance(x, ModuleType) checks (notably
-- but not only -- reload()). Since a major goal here is to make it
possible to do cleaner deprecations, it would be really unfortunate if
switching an existing package to use the metamodule support itself
broke things :-).
- Lookups in the normal case should have no additional performance
overhead, because module lookups are extremely extremely common. (So
this rules out dict proxies and tricks like that -- we really need
'new_module.__dict__ is globals()' to be true.)

AFAICT there are three logically possible strategies for satisfying
that first constraint:
(a) convert the original module object into the type we want, in-place
(b) create a new module object that acts like the original module object
(c) somehow arrange for our special type to be used from the start

My options 1 and 2 are means of accomplishing (a), and my options 3
and 4 are means of accomplishing (b) while working around the
behavioural quirks of module objects (as required by the second
constraint).

The python-ideas thread did also consider several methods of
implementing strategy (c), but they're messy enough that I left them
out here. The problem is that somehow we have to execute code to
create the new subtype *before* we have an entry in sys.modules for
the package that contains the code for the subtype. So one option
would be to add a new rule, that if a file pkgname/__new__.py exists,
then this is executed first and is required to set up
sys.modules["pkgname"] before we exec pkgname/__init__.py. So
pkgname/__new__.py might look like:

import sys
from pkgname._metamodule import MyModuleSubtype
sys.modules[__name__] = MyModuleSubtype(__name__, docstring)

This runs into a lot of problems though. To start with, the 'from
pkgname._metamodule ...' line is an infinite loop, b/c this is the
code used to create sys.modules["pkgname"]. It's not clear where the
globals dict for executing __new__.py comes from (who defines
__name__? Currently that's done by ModuleType.__init__). It only works
for packages, not modules. The need to provide the docstring here,
before __init__.py is even read, is weird. It adds extra stat() calls
to every package lookup. And, the biggest showstopper IMHO: AFAICT
it's impossible to write a polyfill to support this code on old python
versions, so it's useless to any package which needs to keep
compatibility with 2.7 (or even 3.4). Sure, you can backport the whole
import system like importlib2, but telling everyone that they need to
replace every 'import numpy' with 'import importlib2; import numpy' is
a total non-starter.

So, yeah, those 4 options are really the only plausible ones I know of.

Option 1 and option 3 are pretty nice at the language level! Most
Python objects allow assignment to __class__ and __dict__, and both
PyPy and Jython at least do support __class__ assignment. Really the
only downside with Option 1 is that actually implementing it requires
attention from someone with deep knowledge of typeobject.c.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-29 Thread Nathaniel Smith

On Sat, Nov 29, 2014 at 11:32 AM, Antoine Pitrou  wrote:
> On Sat, 29 Nov 2014 01:59:06 +
> Nathaniel Smith  wrote:
>>
>> Option 1: Make it possible to change the type of a module object
>> in-place, so that we can write something like
>>
>>sys.modules[__name__].__class__ = MyModuleSubclass
>>
>> Option 1 downside: The invariants required to make __class__
>> assignment safe are complicated, and only implemented for
>> heap-allocated type objects. PyModule_Type is not heap-allocated, so
>> making this work would require lots of delicate surgery to
>> typeobject.c. I'd rather not go down that rabbit-hole.
>
> Option 1b: have __class__ assignment delegate to a tp_classassign slot
> on the old class, so that typeobject.c doesn't have to be cluttered with
> many special cases.

I'm intrigued -- how would this help?

I have a vague impression that one could add another branch to
object_set_class that went something like

if at least one of the types is a subtype of the other type, and the
subtype is a heap type with tp_dealloc == subtype_dealloc, and the
subtype doesn't add any important slots, and ... then the __class__
assignment is legal.

(This is taking advantage of the fact that if you don't have any extra
slots added, then subtype_dealloc just basically defers to the base
type's tp_dealloc, so it doesn't really matter which one you end up
calling.)

And my vague impression is that there isn't really anything special
about the module type that would allow a tp_classassign function to
simplify this logic.

But these are just vague impressions :-)

>> Option 3: Make it legal to assign to the __dict__ attribute of a
>> module object, so that we can write something like
>>
>>new_module = MyModuleSubclass(...)
>>new_module.__dict__ = sys.modules[__name__].__dict__
>>sys.modules[__name__].__dict__ = {} # ***
>>sys.modules[__name__] = new_module
>>
> [...]
>>
>> Option 4: Add a new function sys.swap_module_internals, which takes
>> two module objects and swaps their __dict__ and other attributes. By
>> making the operation a swap instead of an assignment, we avoid the
>> lifecycle pitfalls from Option 3. By making it a builtin, we can make
>> sure it always handles all the module fields that matter, not just
>> __dict__. Usage:
>
> How do these two options interact with the fact that module functions
> store their globals dict, not the module itself?

I think that's totally fine? The whole point of all these proposals is
to make sure that the final module object does in fact have the
correct globals dict.

~$ git clone [email protected]:njsmith/metamodule.git
~$ cd metamodule
~/metamodule$ python3.4
>>> import examplepkg
>>> examplepkg

>>> examplepkg.f.__globals__ is examplepkg.__dict__
True

If anything this is another argument for why we NEED something like this :-).

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-30 Thread Nathaniel Smith

On Sun, Nov 30, 2014 at 2:54 AM, Guido van Rossum  wrote:
> All the use cases seem to be about adding some kind of getattr hook to
> modules. They all seem to involve modifying the CPython C code anyway. So
> why not tackle that problem head-on and modify module_getattro() to look for
> a global named __getattr__ and if it exists, call that instead of raising
> AttributeError?

You need to allow overriding __dir__ as well for tab-completion, and
some people wanted to use the properties API instead of raw
__getattr__, etc. Maybe someone will want __getattribute__ semantics,
I dunno. So since we're *so close* to being able to just use the
subclassing machinery, it seemed cleaner to try and get that working
instead of reimplementing bits of it piecewise.

That said, __getattr__ + __dir__ would be enough for my immediate use cases.

-n

> On Sat, Nov 29, 2014 at 11:37 AM, Nathaniel Smith  wrote:
>>
>> On Sat, Nov 29, 2014 at 4:21 AM, Guido van Rossum 
>> wrote:
>> > Are these really all our options? All of them sound like hacks, none of
>> > them
>> > sound like anything the language (or even the CPython implementation)
>> > should
>> > sanction. Have I missed the discussion where the use cases and
>> > constraints
>> > were analyzed and all other approaches were rejected? (I might have some
>> > half-baked ideas, but I feel I should read up on the past discussion
>> > first,
>> > and they are probably more fit for python-ideas than for python-dev.
>> > Plus
>> > I'm just writing this email because I'm procrastinating on the type
>> > hinting
>> > PEP. :-)
>>
>> The previous discussions I was referring to are here:
>>   http://thread.gmane.org/gmane.comp.python.ideas/29487/focus=29555
>>   http://thread.gmane.org/gmane.comp.python.ideas/29788
>>
>> There might well be other options; these are just the best ones I
>> could think of :-). The constraints are pretty tight, though:
>> - The "new module" object (whatever it is) should have a __dict__ that
>> aliases the original module globals(). I can elaborate on this if my
>> original email wasn't enough, but hopefully it's obvious that making
>> two copies of the same namespace and then trying to keep them in sync
>> at the very least smells bad :-).
>> - The "new module" object has to be a subtype of ModuleType, b/c there
>> are lots of places that do isinstance(x, ModuleType) checks (notably
>> -- but not only -- reload()). Since a major goal here is to make it
>> possible to do cleaner deprecations, it would be really unfortunate if
>> switching an existing package to use the metamodule support itself
>> broke things :-).
>> - Lookups in the normal case should have no additional performance
>> overhead, because module lookups are extremely extremely common. (So
>> this rules out dict proxies and tricks like that -- we really need
>> 'new_module.__dict__ is globals()' to be true.)
>>
>> AFAICT there are three logically possible strategies for satisfying
>> that first constraint:
>> (a) convert the original module object into the type we want, in-place
>> (b) create a new module object that acts like the original module object
>> (c) somehow arrange for our special type to be used from the start
>>
>> My options 1 and 2 are means of accomplishing (a), and my options 3
>> and 4 are means of accomplishing (b) while working around the
>> behavioural quirks of module objects (as required by the second
>> constraint).
>>
>> The python-ideas thread did also consider several methods of
>> implementing strategy (c), but they're messy enough that I left them
>> out here. The problem is that somehow we have to execute code to
>> create the new subtype *before* we have an entry in sys.modules for
>> the package that contains the code for the subtype. So one option
>> would be to add a new rule, that if a file pkgname/__new__.py exists,
>> then this is executed first and is required to set up
>> sys.modules["pkgname"] before we exec pkgname/__init__.py. So
>> pkgname/__new__.py might look like:
>>
>> import sys
>> from pkgname._metamodule import MyModuleSubtype
>> sys.modules[__name__] = MyModuleSubtype(__name__, docstring)
>>
>> This runs into a lot of problems though. To start with, the 'from
>> pkgname._metamodule ...' line is an infinite loop, b/c this is the
>> code used to create sys.modules["pkgname"]. It's not clear where the
>> globals dict for executing __new__.py comes from (who defines
&g

Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-30 Thread Nathaniel Smith

On Sun, Nov 30, 2014 at 7:27 PM, Ethan Furman  wrote:
> On 11/30/2014 11:15 AM, Guido van Rossum wrote:
>> On Sun, Nov 30, 2014 at 6:15 AM, Brett Cannon wrote:
>>> On Sat, Nov 29, 2014, 21:55 Guido van Rossum wrote:

 All the use cases seem to be about adding some kind of getattr hook
 to modules. They all seem to involve modifying the CPython C code
 anyway. So why not tackle that problem head-on and modify module_getattro()
 to look for a global named __getattr__ and if it exists, call that instead
 of raising AttributeError?
>>>
>>> Not sure if anyone thought of it. :) Seems like a reasonable solution to me.
>>> Be curious to know what the benchmark suite said the impact was.
>>
>> Why would there be any impact? The __getattr__ hook would be similar to the
>> one on classes -- it's only invoked at the point where otherwise 
>> AttributeError
>> would be raised.
>
> I think the bigger question is how do we support it back on 2.7?

I think that's doable -- assuming I'm remembering correctly the
slightly weird class vs. instance lookup rules for special methods,
you can write a module subclass like

class GetAttrModule(types.ModuleType):
def __getattr__(self, name):
return self.__dict__["__getattr__"](name)

and then use ctypes hacks to get it into sys.modules[__name__].

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-30 Thread Nathaniel Smith

On Sun, Nov 30, 2014 at 10:14 PM, Mark Shannon  wrote:
> Hi,
>
> This discussion has been going on for a while, but no one has questioned the
> basic premise. Does this needs any change to the language or interpreter?
>
> I believe it does not. I'm modified your original metamodule.py to not use
> ctypes and support reloading:
> https://gist.github.com/markshannon/1868e7e6115d70ce6e76

Interesting approach!

As written, your code will blow up on any python < 3.4, because when
old_module gets deallocated it'll wipe the module dict clean. And I
guess even on >=3.4, this might still happen if old_module somehow
manages to get itself into a reference loop before getting
deallocated. (Hopefully not, but what a nightmare to debug if it did.)
However, both of these issues can be fixed by stashing a reference to
old_module somewhere in new_module.

The __class__ = ModuleType trick is super-clever but makes me
irrationally uncomfortable. I know that this is documented as a valid
method of fooling isinstance(), but I didn't know that until your
yesterday, and the idea of objects where type(foo) is not
foo.__class__ strikes me as somewhat blasphemous. Maybe this is all
fine though.

The pseudo-module objects generated this way will still won't pass
PyModule_Check, so in theory this could produce behavioural
differences. I can't name any specific places where this will break
things, though. From a quick skim of the CPython source, a few
observations: It means the PyModule_* API functions won't work (e.g.
PyModule_GetDict); maybe these aren't used enough to matter. It looks
like the __reduce__ methods on "method objects"
(Objects/methodobject.c) have a special check for ->m_self being a
module object, and won't pickle correctly if ->m_self ends up pointing
to one of these pseudo-modules. I have no idea how one ends up with a
method whose ->m_self points to a module object, though -- maybe it
never actually happens. PyImport_Cleanup treats module objects
differently from non-module objects during shutdown.

I guess it also has the mild limitation that it doesn't work with
extension modules, but eh. Mostly I'd be nervous about the two points
above.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-30 Thread Nathaniel Smith

On Mon, Dec 1, 2014 at 12:59 AM, Nathaniel Smith  wrote:
> On Sun, Nov 30, 2014 at 10:14 PM, Mark Shannon  wrote:
>> Hi,
>>
>> This discussion has been going on for a while, but no one has questioned the
>> basic premise. Does this needs any change to the language or interpreter?
>>
>> I believe it does not. I'm modified your original metamodule.py to not use
>> ctypes and support reloading:
>> https://gist.github.com/markshannon/1868e7e6115d70ce6e76
>
> Interesting approach!
>
> As written, your code will blow up on any python < 3.4, because when
> old_module gets deallocated it'll wipe the module dict clean. And I
> guess even on >=3.4, this might still happen if old_module somehow
> manages to get itself into a reference loop before getting
> deallocated. (Hopefully not, but what a nightmare to debug if it did.)
> However, both of these issues can be fixed by stashing a reference to
> old_module somewhere in new_module.
>
> The __class__ = ModuleType trick is super-clever but makes me
> irrationally uncomfortable. I know that this is documented as a valid
> method of fooling isinstance(), but I didn't know that until your
> yesterday, and the idea of objects where type(foo) is not
> foo.__class__ strikes me as somewhat blasphemous. Maybe this is all
> fine though.
>
> The pseudo-module objects generated this way will still won't pass
> PyModule_Check, so in theory this could produce behavioural
> differences. I can't name any specific places where this will break
> things, though. From a quick skim of the CPython source, a few
> observations: It means the PyModule_* API functions won't work (e.g.
> PyModule_GetDict); maybe these aren't used enough to matter. It looks
> like the __reduce__ methods on "method objects"
> (Objects/methodobject.c) have a special check for ->m_self being a
> module object, and won't pickle correctly if ->m_self ends up pointing
> to one of these pseudo-modules. I have no idea how one ends up with a
> method whose ->m_self points to a module object, though -- maybe it
> never actually happens. PyImport_Cleanup treats module objects
> differently from non-module objects during shutdown.

Actually, there is one showstopper here -- in the first version where
reload() uses isinstance() is actually 3.4. Before that you need a
real module subtype for reload to work. But this is in principle
workaroundable by using subclassing + ctypes on old versions of python
and the __class__ = hack on new versions.

> I guess it also has the mild limitation that it doesn't work with
> extension modules, but eh. Mostly I'd be nervous about the two points
> above.
>
> -n
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org



-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-30 Thread Nathaniel Smith

On Sun, Nov 30, 2014 at 8:54 PM, Guido van Rossum  wrote:
> On Sun, Nov 30, 2014 at 11:29 AM, Nathaniel Smith  wrote:
>>
>> On Sun, Nov 30, 2014 at 2:54 AM, Guido van Rossum 
>> wrote:
>> > All the use cases seem to be about adding some kind of getattr hook to
>> > modules. They all seem to involve modifying the CPython C code anyway.
>> > So
>> > why not tackle that problem head-on and modify module_getattro() to look
>> > for
>> > a global named __getattr__ and if it exists, call that instead of
>> > raising
>> > AttributeError?
>>
>> You need to allow overriding __dir__ as well for tab-completion, and
>> some people wanted to use the properties API instead of raw
>> __getattr__, etc. Maybe someone will want __getattribute__ semantics,
>> I dunno.
>
> Hm... I agree about __dir__ but the other things feel too speculative.
>
>> So since we're *so close* to being able to just use the
>> subclassing machinery, it seemed cleaner to try and get that working
>> instead of reimplementing bits of it piecewise.
>
> That would really be option 1, right? It's the one that looks cleanest from
> the user's POV (or at least from the POV of a developer who wants to build a
> framework using this feature -- for a simple one-off use case, __getattr__
> sounds pretty attractive). I think that if we really want option 1, the
> issue of PyModuleType not being a heap type can be dealt with.

Options 1-4 all have the effect of making it fairly simple to slot an
arbitrary user-defined module subclass into sys.modules. Option 1 is
the cleanest API though :-).

>>
>> That said, __getattr__ + __dir__ would be enough for my immediate use
>> cases.
>
>
>  Perhaps it would be a good exercise to try and write the "lazy submodule
> import"(*) use case three ways: (a) using only CPython 3.4; (b) using
> __class__ assignment; (c) using customizable __getattr__ and __dir__. I
> think we can learn a lot about the alternatives from this exercise. I
> presume there's already a version of (a) floating around, but if it's been
> used in practice at all, it's probably too gnarly to serve as a useful
> comparison (though its essence may be extracted to serve as such).

(b) and (c) are very straightforward and trivial. Probably I could do
a better job of faking dir()'s default behaviour on modules, but
basically:

# __class__ assignment__ #

import sys, types, importlib

class MyModule(types.ModuleType):
def __getattr__(self, name):
if name in _lazy_submodules:
# implicitly assigns submodule to self.__dict__[name]
return importlib.import_module(name, package=self.__package__)

def __dir__(self):
entries = set(self.__dict__)
entries.update(__lazy_submodules__)
return sorted(entries)

sys.modules[__name__].__class__ = MyModule
_lazy_submodules = {"foo", "bar"}

# customizable __getattr__ and __dir__ #

import importlib

def __getattr__(name):
if name in _lazy_submodules:
# implicitly assigns submodule to globals()[name]
return importlib.import_module(name, package=self.__package__)

def __dir__():
entries = set(globals())
entries.update(__lazy_submodules__)
return sorted(entries)

_lazy_submodules = {"foo", "bar"}

> FWIW I believe all proposals here have a big limitation: the module *itself*
> cannot benefit much from all these shenanigans, because references to
> globals from within the module's own code are just dictionary accesses, and
> we don't want to change that.

I think that's fine -- IMHO the main uses cases here are about
controlling the public API. And a module that really wants to can
always import itself if it wants to pull more shenanigans :-) (i.e.,
foo/__init__.py can do "import foo; foo.blahblah" instead of just
"blahblah".)

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-30 Thread Nathaniel Smith

On Mon, Dec 1, 2014 at 1:27 AM, Guido van Rossum  wrote:
> Nathaniel, did you look at Brett's LazyLoader? It overcomes the subclass
> issue by using a module loader that makes all modules instances of a
> (trivial) Module subclass. I'm sure this approach can be backported as far
> as you need to go.

The problem is that by the time your package's code starts running,
it's too late to install such a loader. Brett's strategy works well
for lazy-loading submodules (e.g., making it so 'import numpy' makes
'numpy.testing' available, but without the speed hit of importing it
immediately), but it doesn't help if you want to actually hook
attribute access on your top-level package (e.g., making 'numpy.foo'
trigger a DeprecationWarning -- we have a lot of stupid exported
constants that we can never get rid of because our rules say that we
have to deprecate things before removing them).

Or maybe you're suggesting that we define a trivial heap-allocated
subclass of PyModule_Type and use that everywhere, as a
quick-and-dirty way to enable __class__ assignment? (E.g., return it
from PyModule_New?) I considered this before but hesitated b/c it
could potentially break backwards compatibility -- e.g. if code A
creates a PyModule_Type object directly without going through
PyModule_New, and then code B checks whether the resulting object is a
module by doing isinstance(x, type(sys)), this will break. (type(sys)
is a pretty common way to get a handle to ModuleType -- in fact both
types.py and importlib use it.) So in my mind I sorta lumped it in
with my Option 2, "minor compatibility break". OTOH maybe anyone who
creates a module object without going through PyModule_New deserves
whatever they get.

-n

> On Sun, Nov 30, 2014 at 5:02 PM, Nathaniel Smith  wrote:
>>
>> On Mon, Dec 1, 2014 at 12:59 AM, Nathaniel Smith  wrote:
>> > On Sun, Nov 30, 2014 at 10:14 PM, Mark Shannon  wrote:
>> >> Hi,
>> >>
>> >> This discussion has been going on for a while, but no one has
>> >> questioned the
>> >> basic premise. Does this needs any change to the language or
>> >> interpreter?
>> >>
>> >> I believe it does not. I'm modified your original metamodule.py to not
>> >> use
>> >> ctypes and support reloading:
>> >> https://gist.github.com/markshannon/1868e7e6115d70ce6e76
>> >
>> > Interesting approach!
>> >
>> > As written, your code will blow up on any python < 3.4, because when
>> > old_module gets deallocated it'll wipe the module dict clean. And I
>> > guess even on >=3.4, this might still happen if old_module somehow
>> > manages to get itself into a reference loop before getting
>> > deallocated. (Hopefully not, but what a nightmare to debug if it did.)
>> > However, both of these issues can be fixed by stashing a reference to
>> > old_module somewhere in new_module.
>> >
>> > The __class__ = ModuleType trick is super-clever but makes me
>> > irrationally uncomfortable. I know that this is documented as a valid
>> > method of fooling isinstance(), but I didn't know that until your
>> > yesterday, and the idea of objects where type(foo) is not
>> > foo.__class__ strikes me as somewhat blasphemous. Maybe this is all
>> > fine though.
>> >
>> > The pseudo-module objects generated this way will still won't pass
>> > PyModule_Check, so in theory this could produce behavioural
>> > differences. I can't name any specific places where this will break
>> > things, though. From a quick skim of the CPython source, a few
>> > observations: It means the PyModule_* API functions won't work (e.g.
>> > PyModule_GetDict); maybe these aren't used enough to matter. It looks
>> > like the __reduce__ methods on "method objects"
>> > (Objects/methodobject.c) have a special check for ->m_self being a
>> > module object, and won't pickle correctly if ->m_self ends up pointing
>> > to one of these pseudo-modules. I have no idea how one ends up with a
>> > method whose ->m_self points to a module object, though -- maybe it
>> > never actually happens. PyImport_Cleanup treats module objects
>> > differently from non-module objects during shutdown.
>>
>> Actually, there is one showstopper here -- in the first version where
>> reload() uses isinstance() is actually 3.4. Before that you need a
>> real module subtype for reload to work. But this is in principle
>> workaroundable by using subclassing + ctypes on old versions of python
>> and

Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-12-01 Thread Nathaniel Smith

On Mon, Dec 1, 2014 at 4:06 AM, Guido van Rossum  wrote:
> On Sun, Nov 30, 2014 at 5:42 PM, Nathaniel Smith  wrote:
>>
>> On Mon, Dec 1, 2014 at 1:27 AM, Guido van Rossum  wrote:
>> > Nathaniel, did you look at Brett's LazyLoader? It overcomes the subclass
>> > issue by using a module loader that makes all modules instances of a
>> > (trivial) Module subclass. I'm sure this approach can be backported as
>> > far
>> > as you need to go.
>>
>> The problem is that by the time your package's code starts running,
>> it's too late to install such a loader. Brett's strategy works well
>> for lazy-loading submodules (e.g., making it so 'import numpy' makes
>> 'numpy.testing' available, but without the speed hit of importing it
>> immediately), but it doesn't help if you want to actually hook
>> attribute access on your top-level package (e.g., making 'numpy.foo'
>> trigger a DeprecationWarning -- we have a lot of stupid exported
>> constants that we can never get rid of because our rules say that we
>> have to deprecate things before removing them).
>>
>> Or maybe you're suggesting that we define a trivial heap-allocated
>> subclass of PyModule_Type and use that everywhere, as a
>> quick-and-dirty way to enable __class__ assignment? (E.g., return it
>> from PyModule_New?) I considered this before but hesitated b/c it
>> could potentially break backwards compatibility -- e.g. if code A
>> creates a PyModule_Type object directly without going through
>> PyModule_New, and then code B checks whether the resulting object is a
>> module by doing isinstance(x, type(sys)), this will break. (type(sys)
>> is a pretty common way to get a handle to ModuleType -- in fact both
>> types.py and importlib use it.) So in my mind I sorta lumped it in
>> with my Option 2, "minor compatibility break". OTOH maybe anyone who
>> creates a module object without going through PyModule_New deserves
>> whatever they get.
>
>
> Couldn't you install a package loader using some install-time hook?
>
> Anyway, I still think that the issues with heap types can be overcome. Hm,
> didn't you bring that up before here? Was the conclusion that it's
> impossible?

I've brought it up several times but no-one's really discussed it :-).
I finally attempted a deep dive into typeobject.c today myself. I'm
not at all sure I understand the intricacies correctly here, but I
*think* __class__ assignment could be relatively easily extended to
handle non-heap types, and in fact the current restriction to heap
types is actually buggy (IIUC).

object_set_class is responsible for checking whether it's okay to take
an object of class "oldto" and convert it to an object of class
"newto". Basically it's goal is just to avoid crashing the interpreter
(as would quickly happen if you e.g. allowed "[].__class__ = dict").
Currently the rules (spread across object_set_class and
compatible_for_assignment) are:

(1) both oldto and newto have to be heap types
(2) they have to have the same tp_dealloc
(3) they have to have the same tp_free
(4) if you walk up the ->tp_base chain for both types until you find
the most-ancestral type that has a compatible struct layout (as
checked by equiv_structs), then either
   (4a) these ancestral types have to be the same, OR
   (4b) these ancestral types have to have the same tp_base, AND they
have to have added the same slots on top of that tp_base (e.g. if you
have class A(object): pass and class B(object): pass then they'll both
have added a __dict__ slot at the same point in the instance struct,
so that's fine; this is checked in same_slots_added).

The only place the code assumes that it is dealing with heap types is
in (4b) -- same_slots_added unconditionally casts the ancestral types
to (PyHeapTypeObject*). AFAICT that's why step (1) is there, to
protect this code. But I don't think the check actually works -- step
(1) checks that the types we're trying to assign are heap types, but
this is no guarantee that the *ancestral* types will be heap types.
[Also, the code for __bases__ assignment appears to also call into
this code with no heap type checks at all.] E.g., I think if you do

class MyList(list):
__slots__ = ()

class MyDict(dict):
__slots__ = ()

MyList().__class__ = MyDict()

then you'll end up in same_slots_added casting PyDict_Type and
PyList_Type to PyHeapTypeObjects and then following invalid pointers
into la-la land. (The __slots__ = () is to maintain layout
compatibility with the base types; if you find builtin types that
already have __dict__ and weaklist and HAVE_GC then this example
should still work even with p

Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-12-02 Thread Nathaniel Smith

On Tue, Dec 2, 2014 at 9:19 AM, Antoine Pitrou  wrote:
> On Mon, 1 Dec 2014 21:38:45 +
> Nathaniel Smith  wrote:
>>
>> object_set_class is responsible for checking whether it's okay to take
>> an object of class "oldto" and convert it to an object of class
>> "newto". Basically it's goal is just to avoid crashing the interpreter
>> (as would quickly happen if you e.g. allowed "[].__class__ = dict").
>> Currently the rules (spread across object_set_class and
>> compatible_for_assignment) are:
>>
>> (1) both oldto and newto have to be heap types
>> (2) they have to have the same tp_dealloc
>> (3) they have to have the same tp_free
>> (4) if you walk up the ->tp_base chain for both types until you find
>> the most-ancestral type that has a compatible struct layout (as
>> checked by equiv_structs), then either
>>(4a) these ancestral types have to be the same, OR
>>(4b) these ancestral types have to have the same tp_base, AND they
>> have to have added the same slots on top of that tp_base (e.g. if you
>> have class A(object): pass and class B(object): pass then they'll both
>> have added a __dict__ slot at the same point in the instance struct,
>> so that's fine; this is checked in same_slots_added).
>>
>> The only place the code assumes that it is dealing with heap types is
>> in (4b)
>
> I'm not sure. Many operations are standardized on heap types that can
> have arbitrary definitions on static types (I'm talking about the tp_
> methods). You'd have to review them to double check.

Reading through the list of tp_ methods I can't see any other that
look problematic. The finalizers are kinda intimate, but I think
people would expect that if you swap an instance's type to something
that has a different __del__ method then it's the new __del__ method
that'll be called. If we wanted to be really careful we should perhaps
do something cleverer with tp_is_gc, but so long as type objects are
the only objects that have a non-trivial tp_is_gc, and the tp_is_gc
call depends only on their tp_flags (which are unmodified by __class__
assignment), then we should still be safe (and anyway this is
orthogonal to the current issues).

> For example, a heap type's tp_new increments the type's refcount, so
> you have to adjust the instance refcount if you cast it from a non-heap
> type to a heap type, and vice-versa (see slot_tp_new()).

Right, fortunately this is easy :-).

> (this raises the interesting question "what happens if you assign to
> __class__ from a __del__ method?")

subtype_dealloc actually attempts to take this possibility into
account -- see the comment "Extract the type again; tp_del may have
changed it". I'm not at all sure that it's handling is *correct* --
there's a bunch of code that references 'type' between the call to
tp_del and this comment, and there's code after the comment that
references 'base' without recalculating it. But it is there :-)

>> -- same_slots_added unconditionally casts the ancestral types
>> to (PyHeapTypeObject*). AFAICT that's why step (1) is there, to
>> protect this code. But I don't think the check actually works -- step
>> (1) checks that the types we're trying to assign are heap types, but
>> this is no guarantee that the *ancestral* types will be heap types.
>> [Also, the code for __bases__ assignment appears to also call into
>> this code with no heap type checks at all.] E.g., I think if you do
>>
>> class MyList(list):
>> __slots__ = ()
>>
>> class MyDict(dict):
>> __slots__ = ()
>>
>> MyList().__class__ = MyDict()
>>
>> then you'll end up in same_slots_added casting PyDict_Type and
>> PyList_Type to PyHeapTypeObjects and then following invalid pointers
>> into la-la land. (The __slots__ = () is to maintain layout
>> compatibility with the base types; if you find builtin types that
>> already have __dict__ and weaklist and HAVE_GC then this example
>> should still work even with perfectly empty subclasses.)
>>
>> Okay, so suppose we move the heap type check (step 1) down into
>> same_slots_added (step 4b), since AFAICT this is actually more correct
>> anyway. This is almost enough to enable __class__ assignment on
>> modules, because the cases we care about will go through the (4a)
>> branch rather than (4b), so the heap type thing is irrelevant.
>>
>> The remaining problem is the requirement that both types have the same
>> tp_dealloc (step 2). ModuleType itself has tp_dealloc ==
>> module_dealloc, while all(?)

Re: [Python-Dev] Python 2.x and 3.x use survey, 2014 edition

2014-12-10 Thread Nathaniel Smith

On 10 Dec 2014 17:16, "Ian Cordasco"  wrote:
>
> On Wed, Dec 10, 2014 at 11:10 AM, Donald Stufft  wrote:
> >
> > On Dec 10, 2014, at 11:59 AM, Bruno Cauet  wrote:
> >
> > Hi all,
> > Last year a survey was conducted on python 2 and 3 usage.
> > Here is the 2014 edition, slightly updated (from 9 to 11 questions).
> > It should not take you more than 1 minute to fill. I would be pleased
if you
> > took that time.
> >
> > Here's the url: http://goo.gl/forms/tDTcm8UzB3
> > I'll publish the results around the end of the year.
> >
> > Last year results: https://wiki.python.org/moin/2.x-vs-3.x-survey
> >
> >
> > Just going to say http://d.stufft.io/image/0z1841112o0C is a hard
question
> > to answer, since most code I write is both.
> >
>
> The same holds for me.

That question appears to have just grown a "compatible with both" option.

It might make sense to add a similar option to the following question about
what you use for personal projects.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 (scandir): Poll to choose the implementation (full C or C+Python)

2015-02-13 Thread Nathaniel Smith

On 13 Feb 2015 02:09, "Victor Stinner"  wrote:
>
> A alternative is to add a new _scandir.c module to host the new C
> code, and share some code with posixmodule.c: remove "static" keyword
> from required C functions (functions to convert Windows attributes to
> a os.stat_result object).

Hopefully not too annoying question from an outsider: has cpython's build
system added the necessary bits to do this on a safe, portable,
non-symbol-namespace polluting way? E.g. using -fvisibility=hidden on Linux?

(I'm partially wondering because until very recently numpy was built by
concatenating all the different c files together and compiling that,
because that was the only portable way to let different files share access
to symbols without also exporting those symbols publicly from the resulting
module shared objects. And numpy supports a lot fewer platforms than
cpython proper...)

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] boxing and unboxing data types

2015-03-08 Thread Nathaniel Smith

On Mar 8, 2015 9:13 PM, "Steven D'Aprano"  wrote:
>
> There's no built-in way of calling __index__ that I know of (no
> equivalent to int(obj)),

There's operator.index(obj), at least.

> but slicing at the very least will call it,
> e.g. seq[a:] will call type(a).__index__.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Use ptyhon -s as default shbang for system python executables/daemons

2015-03-23 Thread Nathaniel Smith

On Mar 23, 2015 8:15 AM, "Antoine Pitrou"  wrote:
>
> On Mon, 23 Mar 2015 08:06:13 -0700
> Toshio Kuratomi  wrote:
> > >
> > > I really think Donald has a good point when he suggests a specific
> > > virtualenv for system programs using Python.
> > >
> > The isolation is what we're seeking but I think the amount of work
required
> > and the added complexity for the distributions will make that hard to
get
> > distributions to sign up for.
> >
> > If someone had the time to write a front end to install packages into
> > a single "system-wide isolation unit" whose backend was a virtualenv we
> > might be able to get distributions on-board with using that.
>
> I don't think we're asking distributions anything. We're suggesting a
> possible path, but it's not python-dev's job to dictate distributions
> how they should package Python.
>
> The virtualenv solution has the virtue that any improvement we might
> put in it to help system packagers would automatically benefit everyone.
> A specific "system Python" would not.
>
> > The front end would need to install software so that you can still
invoke
> > /usr/bin/system-application and "system-application" would take care of
> > activating the virtualenv.  It would need to be about as simple to build
> > as the present python2 setup.py build/install with the flexibility in
> > options that the distros need to install into FHS approved paths.  Some
> > things like man pages, locale files, config files, and possibly other
data
> > files might need to be installed outside of the virtualenv directory.
>
> Well, I don't understand what difference a virtualenv would make.
> Using a virtualenv amounts to invoking a different interpreter path.
> The rest of the filesystem (man pages locations, etc.) is still
> accessible in the same way. But I may miss something :-)

The main issue that jumps to my mind is that 'yum/apt-get install
some-python-package' should install it into both the base python
interpreter and the system virtualenv, but that 'sudo pip install
some-python-package' should install into only the base interpreter but not
the system virtualenv. (Even if those two commands are run in sequence with
different versions of some-python-package.) This seems potentially complex.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [python-committers] Do we need to sign Windows files with GnuPG?

2015-04-03 Thread Nathaniel Smith

On Apr 3, 2015 5:50 PM, "Donald Stufft"  wrote:
>
>
> > On Apr 3, 2015, at 6:38 PM, M.-A. Lemburg  wrote:
> >
> > On 04.04.2015 00:14, Steve Dower wrote:
> >> The thing is, that's exactly the same goodness as Authenticode gives,
except everyone gets that for free and meanwhile you're the only one who
has admitted to using GPG on Windows :)
> >>
> >> Basically, what I want to hear is that GPG sigs provide significantly
better protection than hashes (and I can provide better than MD5 for all
files if it's useful), taking into consideration that (I assume) I'd have
to obtain a signing key for GPG and unless there's a CA involved like there
is for Authenticode, there's no existing trust in that key.
> >
> > Hashes only provide checks against file corruption (and then
> > only if you can trust the hash values). GPG provides all the
> > benefits of public key encryption on arbitrary files (not just
> > code).
> >
> > The main benefit in case of downloadable installers is to
> > be able to make sure that the files are authentic, meaning that
> > they were created and signed by the people listed as packagers.
> >
> > There is no CA infrastructure involved as for SSL certificates
> > or Authenticode, but it's easy to get the keys from key servers
> > given the key signatures available from python.org's download
> > pages.
>
> FTR if we’re relying on people to get the GPG keys from the download
> pages then there’s no additional benefit over just using a hash
> published on the same page.
>
> In order to get additional benefit we’d need to get Steve’s key
> signed by enough people to get him into the strong set.

I don't think that's true -- e.g. people who download the key for checking
3.5.0 will still have it when 3.5.1 is released, and notice if something
silently changes. In general distributing a key id widely on webpages /
mailing lists / using it consistently over multiple releases all increase
security, even if they fall short of perfect. Even the web of trust isn't
particularly trustworthy, it's just useful because it's harder to attack
two targets (the webserver and the WoT) than it is to attack one.

In any case, getting his key into the strong set ought to be trivial given
that pycon is next week.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [python-committers] Do we need to sign Windows files with GnuPG?

2015-04-04 Thread Nathaniel Smith

On Sat, Apr 4, 2015 at 6:07 PM, Steve Dower  wrote:
> There's no problem, per se, but initially it was less trouble to use the
> trusted PSF certificate and native support than to add an extra step using a
> program I don't already use and trust, am restricted in use by my employer
> (because of the license and the fact there are alternatives), and developing
> the trust in a brand new certificate.
>
> Eventually the people saying "do it" will win through sheer persistence,
> since I'll get sick of trying to get a more detailed response and just
> concede. Not sure if that's how we want to be running the project though...

I don't get the impression that there's any particularly detailed
rationale that people aren't giving you; it's just that to the average
python-dev denizen, gpg-signing seems to provide some mild benefits
and with no downside. The certificate trust issue isn't a downside,
just a mild dilution of the upside. And I suspect python-dev generally
doesn't put much weight on the extra effort required (release managers
have all been using gpg for decades, it's pretty trivial), or see any
reason why Microsoft's internal GPL-hate should have any effect on the
PSF's behaviour. Though it's kinda inconvenient for you, obviously. (I
guess you could call Larry or someone, read them a hash over the
phone, and then have them create the actual gpg signatures.)

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492 vs. PEP 3152, new round

2015-04-29 Thread Nathaniel Smith

On Apr 29, 2015 11:49 AM, "Yury Selivanov"  wrote:
>
> Hi Ethan,
>
>
> On 2015-04-29 2:32 PM, Ethan Furman wrote:
>>
>> On 04/29, Yury Selivanov wrote:
>>>
>>> On 2015-04-29 1:25 PM, Ethan Furman wrote:

 cannot also just work and be the same as the parenthesized
 version.
>>>
>>> Because it does not make any sense.
>>
>> I obviously don't understand your position that "it does not make
>> any sense" -- perhaps you could explain a bit?
>>
>> What I see is a suspension point that is waiting for the results of
>> coro(), which will be negated (and returned/assigned/whatever).
>> What part of that doesn't make sense?
>>
>
> Because you want operators to be resolved in the
> order you see them, generally.
>
> You want '(await -fut)' to:
>
> 1. Suspend on fut;
> 2. Get the result;
> 3. Negate it.
>
> This is a non-obvious thing. I would myself interpret it
> as:
>
> 1. Get fut.__neg__();
> 2. await on it.
>
> So I want to make this syntactically incorrect:

As a bystander, I don't really care either way about whether await -fut is
syntactically valid (since like you say it's semantically nonsense
regardless and no one will ever write it). But I would rather like to
actually know what the syntax actually is, not just have a list of examples
(which kinda gives me perl flashbacks). Is there any simple way to state
what the rules for parsing await are? Or do I just have to read the parser
code if I want to know that?

(I suspect this may also be the impetus behind Greg's request that it just
be treated the same as unary minus. IMHO it matters much more that the
rules be predictable and teachable than that they allow or disallow every
weird edge case in exactly the right way.)

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492: What is the real goal?

2015-04-29 Thread Nathaniel Smith

On Wed, Apr 29, 2015 at 1:14 PM, Skip Montanaro
 wrote:
>
> On Wed, Apr 29, 2015 at 2:42 PM, Yury Selivanov 
> wrote:
>>
>> Anyways, I'd be OK to start using a new term, if "coroutine" is
>> confusing.
>
>
> According to Wikipedia, term "coroutine" was first coined in 1958, so
> several generations of computer science graduates will be familiar with the
> textbook definition. If your use of "coroutine" matches the textbook
> definition of the term, I think you should continue to use it instead of
> inventing new names which will just confuse people new to Python.

IIUC the problem is that Python has or will have a number of different
things that count as coroutines by that classic CS definition,
including generators, "async def" functions, and in general any object
that implements the same set of methods as one or both of these
objects, or possibly inherits from a certain abstract base class. It
would be useful to have some terms to refer specifically to async def
functions and the await protocol as opposed to generators and the
iterator protocol, and "coroutine" does not make this distinction.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492 vs. PEP 3152, new round

2015-04-29 Thread Nathaniel Smith

On Wed, Apr 29, 2015 at 3:46 PM, Greg Ewing  wrote:
> Yury Selivanov wrote:
>>
>> I'm not sure
>> why Greg is pushing his Grammar idea so aggressively.
>
>
> Because I believe that any extra complexity in the grammar
> needs a very strong justification. It's complexity in the
> core language, like a new keyword, so it puts a burden on
> everyone's brain.
>
> Saying "I don't think anyone would ever need to write this,
> therefore we should disallow it" is not enough, given that
> there is a substantial cost to disallowing it.
>
> If you don't think there's a cost, consider that we *both*
> seem to be having trouble predicting the consequences of
> your proposed syntax, and you're the one who invented it.
> That's not a good sign!

FWIW, now that I've seen the precedence table in the updated PEP, it
seems really natural to me:
   https://www.python.org/dev/peps/pep-0492/#updated-operator-precedence-table
According to that, "await" is just a prefix operator that binds more
tightly than any arithmetic operation, but less tightly than
indexing/funcall/attribute lookup, which seems about right.

However, if what I just wrote were true, then that would mean that
"await -foo" and "await await foo" would be syntactically legal
(though useless). The fact that they apparently are *not* legal means
that in fact there is still some weird thing going on in the syntax
that I don't understand. And the PEP gives no further details, it just
suggests I go read the parser generator source.

My preference would be that the PEP be updated so that my one-sentence
summary above became correct. But like Guido, I don't necessarily care
about the exact details all that much. What I do feel strongly about
is that whatever syntax we end up with, there should be *some*
accurate human-readable description of *what it is*. AFAICT the PEP
currently doesn't have that.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492 vs. PEP 3152, new round

2015-04-29 Thread Nathaniel Smith

On Wed, Apr 29, 2015 at 4:48 PM, Yury Selivanov  wrote:
> Nathaniel,
>
> On 2015-04-29 7:35 PM, Nathaniel Smith wrote:
>>
>> What I do feel strongly about
>> is that whatever syntax we end up with, there should be*some*
>> accurate human-readable description of*what it is*. AFAICT the PEP
>> currently doesn't have that.
>
> How to define human-readable description of how unary
> minus operator works?

Hah, good question :-). Of course we all learned how to parse
arithmetic in school, so perhaps it's a bit cheating to refer to that
knowledge. Except of course basically all our users *do* have that
knowledge (or else are forced to figure it out anyway). So I would be
happy with a description of "await" that just says "it's like unary
minus but higher precedence".

Even if we put aside our trained intuitions about arithmetic, I think
it's correct to say that the way unary minus is parsed is: everything
to the right of it that has a tighter precedence gets collected up and
parsed as an expression, and then it takes that expression as its
argument. Still pretty simple.

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492 vs. PEP 3152, new round

2015-04-29 Thread Nathaniel Smith

On Wed, Apr 29, 2015 at 5:05 PM, Yury Selivanov  wrote:
> Nathaniel,
>
> On 2015-04-29 7:58 PM, Nathaniel Smith wrote:
>>
>> On Wed, Apr 29, 2015 at 4:48 PM, Yury Selivanov 
>> wrote:
>>>
>>> Nathaniel,
>>>
>>> On 2015-04-29 7:35 PM, Nathaniel Smith wrote:
>>>>
>>>> What I do feel strongly about
>>>> is that whatever syntax we end up with, there should be*some*
>>>> accurate human-readable description of*what it is*. AFAICT the PEP
>>>> currently doesn't have that.
>>>
>>> How to define human-readable description of how unary
>>> minus operator works?
>>
>> Hah, good question :-). Of course we all learned how to parse
>> arithmetic in school, so perhaps it's a bit cheating to refer to that
>> knowledge. Except of course basically all our users *do* have that
>> knowledge (or else are forced to figure it out anyway). So I would be
>> happy with a description of "await" that just says "it's like unary
>> minus but higher precedence".
>>
>> Even if we put aside our trained intuitions about arithmetic, I think
>> it's correct to say that the way unary minus is parsed is: everything
>> to the right of it that has a tighter precedence gets collected up and
>> parsed as an expression, and then it takes that expression as its
>> argument. Still pretty simple.
>
>
> Well, await is defined exactly like that ;)

So you're saying that "await -fut" and "await await fut" are actually
legal syntax after all, contra what the PEP says? Because "- -fut" is
totally legal syntax, so if await and unary minus work the same...
(Again I don't care about those examples in their own right, I just
find it frustrating that I can't answer these questions without asking
you each time.)

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492 vs. PEP 3152, new round

2015-04-30 Thread Nathaniel Smith

On Apr 30, 2015 1:57 AM, "Greg Ewing"  wrote:
>
> Nathaniel Smith wrote:
>>
>> Even if we put aside our trained intuitions about arithmetic, I think
>> it's correct to say that the way unary minus is parsed is: everything
>> to the right of it that has a tighter precedence gets collected up and
>> parsed as an expression, and then it takes that expression as its
>> argument.
>
>
> Tighter or equal, actually: '--a' is allowed.
>
> This explains why Yury's syntax disallows 'await -f'.
> The 'await' operator requires something after it, but
> there's *nothing* between it and the following '-',
> which binds less tightly.
>
> So it's understandable, but you have to think a bit
> harder.
>
> Why do we have to think harder? I suspect it's because
> the notion of precedence is normally introduced to resolve
> ambiguities. Knowing that infix '*' has higher precedence
> than infix '+' tells us that 'a + b * c' is parsed as
> 'a + (b * c)' and not '(a + b) * c'.
>
> Similarly, knowing that infix '.' has higher precedence
> than prefix '-' tells us that '-a.b' is parsed as
> '-(a.b)' rather than '(-a).b'.
>
> However, giving prefix 'await' higher precedence than
> prefix '-' doesn't serve to resolve any ambiguity.
> '- await f' is parsed as '-(await f)' either way, and
> 'await f + g' is parsed as '(await f) + g' either way.
>
> So when we see 'await -f', we think we already know
> what it means. There is only one possible order for
> the operations, so it doesn't look as though precedence
> comes into it at all, and we don't consider it when
> judging whether it's a valid expression.

The other reason this threw me is that I've recently been spending time
with a shunting yard parser, and in shunting yard parsers unary prefix
operators just work in the expected way (their precedence only affects
their interaction with later binary operators; a chain of unaries is always
allowed). It's just a limitation of the parser generator tech that python
uses that it can't handle unary operators in the natural fashion. (OTOH it
can handle lots of cases that shunting yard parsers can't -- I'm not
criticizing python's choice of parser.) Once I read the new "documentation
grammar" this became much clearer.

> What's the conclusion from all this? I think it's
> that using precedence purely to disallow certain
> constructs, rather than to resolve ambiguities, leads
> to a grammar with less-than-intuitive characteristics.

The actual effect of making "await" a different precedence is to resolve
the ambiguity in
  await x ** 2

If await acted like -, then this would be
  await (x ** 2)
But with the proposed grammar, it's instead
  (await x) ** 2
Which is probably correct, and produces the IMHO rather nice invariant that
"await" binds more tightly than arithmetic in general (instead of having to
say that it binds more tightly than arithmetic *except* in this one corner
case...).

But then given the limitations of Python's parser plus the desire to
disambiguate the expression above in the given way, it becomes an arguably
regrettable, yet inevitable, consequence that
  await -fut
  await +fut
  await ~fut
become parse errors.

AFAICT these and the ** case are the only expressions where there's any
difference between Yury's proposed grammar and your proposal of treating
await like unary minus.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492 vs. PEP 3152, new round

2015-04-30 Thread Nathaniel Smith

On Apr 30, 2015 8:40 PM, "Guido van Rossum"  wrote:
>
> On Thu, Apr 30, 2015 at 8:30 PM, Nathaniel Smith  wrote:
>>
>> The actual effect of making "await" a different precedence is to resolve
the ambiguity in
>>
>>   await x ** 2
>>
>> If await acted like -, then this would be
>>   await (x ** 2)
>> But with the proposed grammar, it's instead
>>   (await x) ** 2
>> Which is probably correct, and produces the IMHO rather nice invariant
that "await" binds more tightly than arithmetic in general (instead of
having to say that it binds more tightly than arithmetic *except* in this
one corner case...)
>
> Correct.
>>
>> AFAICT these and the ** case are the only expressions where there's any
difference between Yury's proposed grammar and your proposal of treating
await like unary minus. But then given the limitations of Python's parser
plus the desire to disambiguate the expression above in the given way, it
becomes an arguably regrettable, yet inevitable, consequence that
>>
>>   await -fut
>>   await +fut
>>   await ~fut
>> become parse errors.
>
>  Why is that regrettable? Do you have a plan for overloading one of those
on Futures? I personally consider it a feature that you can't do that. :-)

I didn't say it was regrettable, I said it was arguably regrettable. For
proof, see the last week of python-dev ;-).

(I guess all else being equal it would be nice if unary operators could
stack arbitrarily, since that really is the more natural parse rule IMO and
also if things had worked that way then I would have spent this thread less
confused. But this is a pure argument from elegance. In practice there's
obviously no good reason to write "await -fut" or "-not x", so meh,
whatever.)

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492: async/await in Python; version 5

2015-05-05 Thread Nathaniel Smith

On May 5, 2015 12:40 PM, "Jim J. Jewett"  wrote:
>
>
> On Tue May 5 18:29:44 CEST 2015, Yury Selivanov posted an updated PEP492.
>
> Where are the following over-simplifications wrong?
>
[...snip...]
>
> [Note that the actual PEP uses iteration over the results of a new
> __await__ magic method, rather than .result on the object itself.
> I couldn't tell whether this was for explicit marking, or just for
> efficiency in avoiding future creation.]
>
> (4)  "await EXPR" is just syntactic sugar for EXPR.result
>
> except that, by being syntax, it better marks locations where
> unrelated tasks might have a chance to change shared data.
>
> [And that, as currently planned, the result of an await isn't
> actually the result; it is an iterator of results.]

This is where you're missing a key idea. (And I agree that more high-level
docs are very much needed!) Remember that this is just regular single
threaded python code, so just writing EXPR.result cannot possibly cause the
current task to pause and another one to start running, and then magically
switch back somehow when the result does become available. Imagine trying
to implement a .result attribute that does that -- it's impossible.

Writing 'x = await socket1.read(1)' is actually equivalent to writing a
little loop like:

while True:
# figure out what we need to happen to make progress
needed = "data from socket 1"
# suspend this function,
# and send the main loop a message telling it what we need
reply = (yield needed)
# okay, the main loop woke us up again
# let's see if they've sent us back what we asked for
if reply.type == "data from socket 1":
# got it!
x = reply.payload
break
else:
# if at first you don't succeed...
continue

(Now stare at the formal definition of 'yield from' until you see how it
maps onto the above... And if you're wondering why we need a loop, think
about the case where instead of calling socket.read we're calling http.get
or something that requires multiple steps to complete.)

So there actually is semantically no iterator here -- the thing that looks
like an iterator is actually the chatter back and forth between the
lower-level code and the main loop that is orchestrating everything. Then
when that's done, it returns the single result.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492: async/await in Python; version 4

2015-05-05 Thread Nathaniel Smith

On May 5, 2015 2:14 PM, "Guido van Rossum"  wrote:
>
> In the PEP 492 world, these concepts map as follows:
>
> - Future translates to "something with an __await__ method" (and asyncio
Futures are trivially made compliant by defining Future.__await__ as an
alias for Future.__iter__);
>
> - "asyncio coroutine" maps to "PEP 492 coroutine object" (either defined
with `async def` or a generator decorated with @types.coroutine -- note
that @asyncio.coroutine incorporates the latter);
>
> - "either of the above" maps to "awaitable".

Err, aren't the first and third definitions above identical?

Surely we want to say: an async def function is a convenient shorthand for
creating a custom awaitable (exactly like how generators are a convenient
shorthand for creating custom iterators), and a Future is-an awaitable that
also adds some extra methods.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Python-versus-CPython question for mul dispatch

2015-05-14 Thread Nathaniel Smith

Hi all,

While attempting to clean up some of the more squamous aspects of
numpy's operator dispatch code [1][2], I've encountered a situation
where the semantics we want and are using are possible according to
CPython-the-interpreter, but AFAICT ought not to be possible according
to Python-the-language, i.e., it's not clear to me whether it's
possible even in principle to implement an object that works the way
numpy.ndarray does in any other interpreter. Which makes me a bit
nervous, so I wanted to check if there was any ruling on this.

Specifically, the quirk we are relying on is this: in CPython, if you do

  [1, 2] * my_object

then my_object's __rmul__ gets called *before* list.__mul__,
*regardless* of the inheritance relationship between list and
type(my_object). This occurs as a side-effect of the weirdness
involved in having both tp_as_number->nb_multiply and
tp_as_sequence->sq_repeat in the C API -- when evaluating "a * b",
CPython tries a's nb_multiply, then b's nb_multiply, then a's
sq_repeat, then b's sq_repeat. Since list has an sq_repeat but not an
nb_multiply, this means that my_object's nb_multiply gets called
before any list method.

Here's an example demonstrating how weird this is. list.__mul__ wants
an integer, and by "integer" it means "any object with an __index__
method". So here's a class that list is happy to be multiplied by --
according to the ordinary rules for operator dispatch, in the example
below Indexable.__mul__ and __rmul__ shouldn't even get a look-in:

In [3]: class Indexable(object):
   ...: def __index__(self):
   ...: return 2
   ...:

In [4]: [1, 2] * Indexable()
Out[4]: [1, 2, 1, 2]

But, if I add an __rmul__ method, then this actually wins:

In [6]: class IndexableWithMul(object):
   ...: def __index__(self):
   ...: return 2
  ...: def __mul__(self, other):
   ...: return "indexable forward mul"
   ...: def __rmul__(self, other):
   ...: return "indexable reverse mul"

In [7]: [1, 2] * IndexableWithMul()
Out[7]: 'indexable reverse mul'

In [8]: IndexableWithMul() * [1, 2]
Out[8]: 'indexable forward mul'

NumPy arrays, of course, correctly define both __index__ method (which
raises an array on general arrays but coerces to int for arrays that
contain exactly 1 integer), and also defines an nb_multiply slot which
accepts lists and performs elementwise multiplication:

In [9]: [1, 2] * np.array(2)
Out[9]: array([2, 4])

And that's all great! Just what we want. But the only reason this is
possible, AFAICT, is that CPython 'list' is a weird type with
undocumented behaviour that you can't actually define using pure
Python code.

Should I be worried?

-n

[1] https://github.com/numpy/numpy/pull/5864
[2] https://github.com/numpy/numpy/issues/5844

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-versus-CPython question for mul dispatch

2015-05-14 Thread Nathaniel Smith

On Thu, May 14, 2015 at 9:29 PM, Guido van Rossum  wrote:
> I expect you can make something that behaves like list by defining __mul__
> and __rmul__ and returning NotImplemented.

Hmm, it's fairly tricky, and part of the trick is that you can never
return NotImplemented (because you have to pretty much take over and
entirely replace the normal dispatch rules inside __mul__ and
__rmul__), but see attached for something I think should work.

So I guess this is just how Python's list, tuple, etc. work, and PyPy
and friends need to match...

-n

> On Thursday, May 14, 2015, Stefan Richthofer 
> wrote:
>>
>> >>Should I be worried?
>>
>> You mean should *I* be worried ;)
>>
>> Stuff like this is highly relevant for JyNI, so thanks very much for
>> clarifying this
>> subtle behavior. It went onto my todo-list right now to ensure that JyNI
>> will emulate
>> this behavior as soon as I am done with gc-support. (Hopefully it will be
>> feasible,
>> but I can only tell in half a year or so since there are currently other
>> priorities.)
>> Still, this "essay" potentially will save me a lot of time.
>>
>> So, everybody please feel encouraged to post things like this as they come
>> up. Maybe
>> there could be kind of a pitfalls-page somewhere in the docs collecting
>> these things.
>>
>> Best
>>
>> Stefan
>>
>>
>> > Gesendet: Freitag, 15. Mai 2015 um 02:45 Uhr
>> > Von: "Nathaniel Smith" 
>> > An: "Python Dev" 
>> > Betreff: [Python-Dev] Python-versus-CPython question for __mul__
>> > dispatch
>> >
>> > Hi all,
>> >
>> > While attempting to clean up some of the more squamous aspects of
>> > numpy's operator dispatch code [1][2], I've encountered a situation
>> > where the semantics we want and are using are possible according to
>> > CPython-the-interpreter, but AFAICT ought not to be possible according
>> > to Python-the-language, i.e., it's not clear to me whether it's
>> > possible even in principle to implement an object that works the way
>> > numpy.ndarray does in any other interpreter. Which makes me a bit
>> > nervous, so I wanted to check if there was any ruling on this.
>> >
>> > Specifically, the quirk we are relying on is this: in CPython, if you do
>> >
>> >   [1, 2] * my_object
>> >
>> > then my_object's __rmul__ gets called *before* list.__mul__,
>> > *regardless* of the inheritance relationship between list and
>> > type(my_object). This occurs as a side-effect of the weirdness
>> > involved in having both tp_as_number->nb_multiply and
>> > tp_as_sequence->sq_repeat in the C API -- when evaluating "a * b",
>> > CPython tries a's nb_multiply, then b's nb_multiply, then a's
>> > sq_repeat, then b's sq_repeat. Since list has an sq_repeat but not an
>> > nb_multiply, this means that my_object's nb_multiply gets called
>> > before any list method.
>> >
>> > Here's an example demonstrating how weird this is. list.__mul__ wants
>> > an integer, and by "integer" it means "any object with an __index__
>> > method". So here's a class that list is happy to be multiplied by --
>> > according to the ordinary rules for operator dispatch, in the example
>> > below Indexable.__mul__ and __rmul__ shouldn't even get a look-in:
>> >
>> > In [3]: class Indexable(object):
>> >...: def __index__(self):
>> >...: return 2
>> >...:
>> >
>> > In [4]: [1, 2] * Indexable()
>> > Out[4]: [1, 2, 1, 2]
>> >
>> > But, if I add an __rmul__ method, then this actually wins:
>> >
>> > In [6]: class IndexableWithMul(object):
>> >...: def __index__(self):
>> >...: return 2
>> >   ...: def __mul__(self, other):
>> >...: return "indexable forward mul"
>> >...: def __rmul__(self, other):
>> >...: return "indexable reverse mul"
>> >
>> > In [7]: [1, 2] * IndexableWithMul()
>> > Out[7]: 'indexable reverse mul'
>> >
>> > In [8]: IndexableWithMul() * [1, 2]
>> > Out[8]: 'indexable forward mul'
>> >
>> > NumPy arrays, of course, correctly define both __index__ method (which
>> > raises an array on general arrays but coerces to int for arrays

Re: [Python-Dev] Python-versus-CPython question for mul dispatch

2015-05-15 Thread Nathaniel Smith

On Thu, May 14, 2015 at 11:53 PM, Nathaniel Smith  wrote:
> On Thu, May 14, 2015 at 9:29 PM, Guido van Rossum  wrote:
>> I expect you can make something that behaves like list by defining __mul__
>> and __rmul__ and returning NotImplemented.
>
> Hmm, it's fairly tricky, and part of the trick is that you can never
> return NotImplemented (because you have to pretty much take over and
> entirely replace the normal dispatch rules inside __mul__ and
> __rmul__), but see attached for something I think should work.
>
> So I guess this is just how Python's list, tuple, etc. work, and PyPy
> and friends need to match...

For the record, it looks like PyPy does already have a hack to
implement this -- they do it by having a hidden flag on the built-in
sequence types which the implementations of '*' and '+' check for, and
if it's found it triggers a different rule for dispatching to the
__op__ methods:

https://bitbucket.org/pypy/pypy/src/a1a494787f4112e42f50c6583e0fea18db3fb4fa/pypy/objspace/descroperation.py?at=default#cl-692

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-versus-CPython question for mul dispatch

2015-05-17 Thread Nathaniel Smith

On Sat, May 16, 2015 at 1:31 AM, Nick Coghlan  wrote:
> On 16 May 2015 at 07:35, Nathaniel Smith  wrote:
>> On Thu, May 14, 2015 at 11:53 PM, Nathaniel Smith  wrote:
>>> On Thu, May 14, 2015 at 9:29 PM, Guido van Rossum  wrote:
>>>> I expect you can make something that behaves like list by defining __mul__
>>>> and __rmul__ and returning NotImplemented.
>>>
>>> Hmm, it's fairly tricky, and part of the trick is that you can never
>>> return NotImplemented (because you have to pretty much take over and
>>> entirely replace the normal dispatch rules inside __mul__ and
>>> __rmul__), but see attached for something I think should work.
>>>
>>> So I guess this is just how Python's list, tuple, etc. work, and PyPy
>>> and friends need to match...
>>
>> For the record, it looks like PyPy does already have a hack to
>> implement this -- they do it by having a hidden flag on the built-in
>> sequence types which the implementations of '*' and '+' check for, and
>> if it's found it triggers a different rule for dispatching to the
>> __op__ methods:
>> 
>> https://bitbucket.org/pypy/pypy/src/a1a494787f4112e42f50c6583e0fea18db3fb4fa/pypy/objspace/descroperation.py?at=default#cl-692
>
> Oh, that's rather annoying that the PyPy team implemented bug-for-bug
> compatibility there, and didn't follow up on the operand precedence
> bug report to say that they had done so. We also hadn't previously
> been made aware that NumPy is relying on this operand precedence bug
> to implement publicly documented API behaviour, so fixing it *would*
> break end user code :(

I don't think any of us were aware of it either :-).

It is a fairly obscure case -- it only comes up specifically if you
have a single-element integer array that you are trying to multiply by
a list that you expect to be auto-coerced to an array. If Python
semantics were such that this became impossible to handle correctly
then we would survive. (We've certainly survived worse, e.g.
arr[array_of_indices] += 1 silently gives the wrong/unexpected result
when array_of_indices has duplicate entries, and this bites people
constantly. Unfortunately I can't see any reasonable way to fix this
within Python's semantics, so... oh well.)

But yeah, given that we're at a point where list dispatch actually has
worked this way forever and across multiple interpreter
implementations, I think it's de facto going to end up part of the
language specification unless someone does something pretty quick...

> I guess that means someone in the numeric community will need to write
> a PEP to make this "try the other operand first" "feature" part of the
> language specification, so that other interpreters can implement it up
> front, rather than all having to come up with their own independent
> custom hacks just to make NumPy work.

I'll make a note...

> P.S. It would also be nice if someone could take on the PEP for a
> Python level buffer API for 3.6: http://bugs.python.org/issue13797

At a guess, if you want to find people who have this itch strong
enough to try scratching it, then probably numpy users are actually
not your best bet, b/c if you have numpy then you already have
workarounds. In particular, numpy still supports a legacy Python level
buffer export API:

   http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#python-side

So if all you want is to hand a buffer to numpy (rather than to an
arbitrary PEP 3118 consumer) then this works fine, and if you do need
an arbitrary PEP 3118 consumer then you can use numpy as an adaptor
(use __array_interface__ to convert your object to ndarray -> ndarray
supports the PEP 3118 API).

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 556: Threaded garbage collection

2017-09-08 Thread Nathaniel Smith

On Fri, Sep 8, 2017 at 12:13 PM, Antoine Pitrou  wrote:
> On Fri, 08 Sep 2017 12:04:10 -0700
> Benjamin Peterson  wrote:
>> I like it overall.
>>
>> - I was wondering what happens during interpreter shutdown. I see you
>> have that listed as a open issue. How about simply shutting down the
>> finalization thread and not guaranteeing that finalizers are actually
>> ever run à la Java?
>
> I don't know.  People generally have expectations towards stuff being
> finalized properly (especially when talking about files etc.).
> Once the first implementation is devised, we will know more about
> what's workable (perhaps we'll have to move _PyGC_Fini earlier in the
> shutdown sequence?  perhaps we'll want to switch back to serial mode
> when shutting down?).

PyPy just abandons everything when shutting down, instead of running
finalizers. See the last paragraph of :
http://doc.pypy.org/en/latest/cpython_differences.html#differences-related-to-garbage-collection-strategies

So that might be a useful source of experience.

On another note, I'm going to be that annoying person who suggests
massively extending the scope of your proposal. Feel free to throw
things at me or whatever.

Would it make sense to also move signal handlers to run in this
thread? Those are the other major source of nasty re-entrancy
problems.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 549 v2: now titled Instance Descriptors

2017-09-08 Thread Nathaniel Smith

On Fri, Sep 8, 2017 at 1:45 PM, Ethan Furman  wrote:
> On 09/08/2017 12:44 PM, Larry Hastings wrote:
>
>> I've updated PEP 549 with a new title--"Instance Descriptors" is a better
>> name than "Instance Properties"--and to
>> clarify my rationale for the PEP.  I've also updated the prototype with
>> code cleanups and a new type:
>> "collections.abc.InstanceDescriptor", a base class that allows user
>> classes to be instance descriptors.
>
>
> I like the new title, I'm +0 on the PEP itself, and I have one correction
> for the PEP:  we've had the ability to simulate module properties for ages:
>
> Python 2.7.6 (default, Oct 26 2016, 20:32:47)
> [GCC 4.8.4] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> --> import module_class
> --> module_class.hello
> 'hello'
> --> module_class.hello = 'hola'
> --> module_class.hello
> 'hola'
>
> And the code:
>
> class ModuleClass(object):
> @property
> def hello(self):
> try:
> return self._greeting
> except AttributeError:
> return 'hello'
> @hello.setter
> def hello(self, value):
> self._greeting = value
>
> import sys
> sys.modules[__name__] = ModuleClass()
>
> I will admit I don't see what reassigning the __class__ attribute on a
> module did for us.

If you have an existing package that doesn't replace itself in
sys.modules, then it's difficult and risky to switch to that form --
don't think of toy examples, think of django/__init__.py or
numpy/__init__.py. You have to rewrite the whole export logic, and you
have to figure out what to do with things like submodules that import
from the parent module before the swaparoo happens, you can get skew
issues between the original module namespace and the replacement class
namespace, etc. The advantage of the __class__ assignment trick (as
compared to what we had before) is that it lets you easily and safely
retrofit this kind of magic onto existing packages.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 554 v2 (new "interpreters" module)

2017-09-08 Thread Nathaniel Smith

On Fri, Sep 8, 2017 at 8:54 PM, Michel Desmoulin
 wrote:
>
> Le 09/09/2017 à 01:28, Stefan Krah a écrit :
>> Still, the argument "who uses subinterpreters?" of course still remains.
>
> For now, nobody. But if we expose it and web frameworks manage to create
> workers as fast as multiprocessing and as cheap as threading, you will
> find a lot of people starting to want to use it.

To temper expectations a bit here, it sounds like the first version
might be more like: as slow as threading (no multicore), as expensive
as multiprocessing (no shared memory), and -- on Unix -- slower to
start than either of them (no fork).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 554 v2 (new "interpreters" module)

2017-09-09 Thread Nathaniel Smith

On Sep 9, 2017 9:07 AM, "Nick Coghlan"  wrote:


To immediately realise some level of efficiency benefits from the
shared memory space between the main interpreter and subinterpreters,
I also think these low level FIFOs should be defined as accepting any
object that supports the PEP 3118 buffer protocol, and emitting
memoryview() objects on the receiving end, rather than being bytes-in,
bytes-out.


Is your idea that this memoryview would refer directly to the sending
interpreter's memory (as opposed to a copy into some receiver-owned
buffer)? If so, then how do the two subinterpreters coordinate the buffer
release when the memoryview is closed?

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 554 v2 (new "interpreters" module)

2017-09-09 Thread Nathaniel Smith

On Sep 8, 2017 4:06 PM, "Eric Snow"  wrote:


   run(code):

  Run the provided Python code in the interpreter, in the current
  OS thread.  If the interpreter is already running then raise
  RuntimeError in the interpreter that called ``run()``.

  The current interpreter (which called ``run()``) will block until
  the subinterpreter finishes running the requested code.  Any
  uncaught exception in that code will bubble up to the current
  interpreter.


This phrase "bubble up" here is doing a lot of work :-). Can you elaborate
on what you mean? The text now makes it seem like the exception will just
pass from one interpreter into another, but that seems impossible – it'd
mean sharing not just arbitrary user defined exception classes but full
frame objects...

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] breakpoint() and $PYTHONBREAKPOINT

2017-09-10 Thread Nathaniel Smith

On Sun, Sep 10, 2017 at 12:06 PM, Barry Warsaw  wrote:
> For PEP 553, I think it’s a good idea to support the environment variable 
> $PYTHONBREAKPOINT[*] but I’m stuck on a design question, so I’d like to get 
> some feedback.
>
> Should $PYTHONBREAKPOINT be consulted in breakpoint() or in 
> sys.breakpointhook()?

Wouldn't the usual pattern be to check $PYTHONBREAKPOINT once at
startup, and if it's set use it to initialize sys.breakpointhook()?
Compare to, say, $PYTHONPATH.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 557: Data Classes

2017-09-10 Thread Nathaniel Smith

Hi Eric,

A few quick comments:

Why do you even have a hash= argument on individual fields? For the whole
class, I can imagine you might want to explicitly mark a whole class as
unhashable, but it seems like the only thing you can do with the
field-level hash= argument is to create a class where the __hash__ and
__eq__ take different fields into account, and why would you ever want that?

Though honestly I can see a reasonable argument for removing the
class-level hash= option too. And even if you keep it you might want to
error on some truly nonsensical options like defining __hash__ without
__eq__. (Also watch out that Python's usual rule about defining __eq__
blocking the inheritance of __hash__ does not kick in if __eq__ is added
after the class is created.)

I've sometimes wished that attrs let me control whether it generated
equality methods (eq/ne/hash) separately from ordering methods (lt/gt/...).
Maybe the cmp= argument should take an enum with options
none/equality-only/full?

The "why not attrs" section kind of reads like "because it's too popular
and useful"?

-n

On Sep 8, 2017 08:44, "Eric V. Smith"  wrote:

Oops, I forgot the link. It should show up shortly at
https://www.python.org/dev/peps/pep-0557/.

Eric.


On 9/8/17 7:57 AM, Eric V. Smith wrote:

> I've written a PEP for what might be thought of as "mutable namedtuples
> with defaults, but not inheriting tuple's behavior" (a mouthful, but it
> sounded simpler when I first thought of it). It's heavily influenced by
> the attrs project. It uses PEP 526 type annotations to define fields.
> From the overview section:
>
> @dataclass
> class InventoryItem:
> name: str
> unit_price: float
> quantity_on_hand: int = 0
>
> def total_cost(self) -> float:
> return self.unit_price * self.quantity_on_hand
>
> Will automatically add these methods:
>
>   def __init__(self, name: str, unit_price: float, quantity_on_hand: int
> = 0) -> None:
>   self.name = name
>   self.unit_price = unit_price
>   self.quantity_on_hand = quantity_on_hand
>   def __repr__(self):
>   return
> f'InventoryItem(name={self.name!r},unit_price={self.unit_pri
> ce!r},quantity_on_hand={self.quantity_on_hand!r})'
>
>   def __eq__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) ==
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __ne__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) !=
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __lt__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) <
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __le__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) <=
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __gt__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) >
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __ge__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) >=
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>
> Data Classes saves you from writing and maintaining these functions.
>
> The PEP is largely complete, but could use some filling out in places.
> Comments welcome!
>
> Eric.
>
> P.S. I wrote this PEP when I was in my happy place.
>
> ___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/njs%
40pobox.com
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 557: Data Classes

2017-09-11 Thread Nathaniel Smith

On Mon, Sep 11, 2017 at 5:32 AM, Eric V. Smith  wrote:
> On 9/10/17 11:08 PM, Nathaniel Smith wrote:
>>
>> Hi Eric,
>>
>> A few quick comments:
>>
>> Why do you even have a hash= argument on individual fields? For the
>> whole class, I can imagine you might want to explicitly mark a whole
>> class as unhashable, but it seems like the only thing you can do with
>> the field-level hash= argument is to create a class where the __hash__
>> and __eq__ take different fields into account, and why would you ever
>> want that?
>
>
> The use case is that you have a cache, or something similar, that doesn't
> affect the object identity.

But wouldn't this just be field(cmp=False), no need to fiddle with hash=?

>> Though honestly I can see a reasonable argument for removing the
>> class-level hash= option too. And even if you keep it you might want to
>> error on some truly nonsensical options like defining __hash__ without
>> __eq__. (Also watch out that Python's usual rule about defining __eq__
>> blocking the inheritance of __hash__ does not kick in if __eq__ is added
>> after the class is created.)
>>
>> I've sometimes wished that attrs let me control whether it generated
>> equality methods (eq/ne/hash) separately from ordering methods
>> (lt/gt/...). Maybe the cmp= argument should take an enum with options
>> none/equality-only/full?
>
>
> Yeah, I've thought about this, too. But I don't have any use case in mind,
> and if it hasn't come up with attrs, then I'm reluctant to break new ground
> here.

https://github.com/python-attrs/attrs/issues/170

>From that thread, it feels like part of the problem is that it's
awkward to encode this using only boolean arguments, but they're sort
of stuck with that for backcompat with how they originally defined
cmp=, hence my suggestion to consider making it an enum from the start
:-).

>> The "why not attrs" section kind of reads like "because it's too popular
>> and useful"?
>
>
> I'll add some words to that section, probably focused on typing
> compatibility. My general feeling is that attrs has some great design
> decisions, but goes a little too far (e.g., conversions, validations). As
> with most things we add, I'm trying to be as minimalist as possible, while
> still being widely useful and allowing 3rd party extensions and future
> features.

If the question is "given that we're going to add something to the
stdlib, why shouldn't that thing be attrs?" then I guess it's
sufficient to say "because the attrs developers didn't want it". But I
think the PEP should also address the question "why are we adding
something to the stdlib, instead of just recommending people install
attrs".

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] breakpoint() and $PYTHONBREAKPOINT

2017-09-11 Thread Nathaniel Smith

On Mon, Sep 11, 2017 at 5:27 PM, Barry Warsaw  wrote:
> On Sep 10, 2017, at 13:46, Nathaniel Smith  wrote:
>>
>> On Sun, Sep 10, 2017 at 12:06 PM, Barry Warsaw  wrote:
>>> For PEP 553, I think it’s a good idea to support the environment variable 
>>> $PYTHONBREAKPOINT[*] but I’m stuck on a design question, so I’d like to get 
>>> some feedback.
>>>
>>> Should $PYTHONBREAKPOINT be consulted in breakpoint() or in 
>>> sys.breakpointhook()?
>>
>> Wouldn't the usual pattern be to check $PYTHONBREAKPOINT once at
>> startup, and if it's set use it to initialize sys.breakpointhook()?
>> Compare to, say, $PYTHONPATH.
>
> Perhaps, but what would be the visible effects of that?  I.e. what would that 
> buy you?

Why is this case special enough to break the rules?

Compared to checking it on each call to sys.breakpointhook(), I guess
the two user-visible differences in behavior would be:

- whether mutating os.environ["PYTHONBREAKPOINT"] inside the process
affects future calls. I would find it quite surprising if it did;
generally when we mutate envvars like os.environ["PYTHONPATH"] it's a
way to set things up for future child processes and doesn't affect our
process.

- whether the import happens immediately at startup, or is delayed
until the first call to breakpoint(). If it's imported once at
startup, then this adds overhead to programs that set PYTHONBREAKPOINT
but don't use it, and if the envvar is set to some nonsense then you
get an error immediately instead of at the first call to breakpoint().
These both seem like fairly minor differences to me, but maybe saving
that 30 ms or whatever of startup time is an important enough
optimization to justify the special case?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] breakpoint() and $PYTHONBREAKPOINT

2017-09-11 Thread Nathaniel Smith

On Mon, Sep 11, 2017 at 6:45 PM, Barry Warsaw  wrote:
> On Sep 11, 2017, at 18:15, Nathaniel Smith  wrote:
>
>> Compared to checking it on each call to sys.breakpointhook(), I guess
>> the two user-visible differences in behavior would be:
>>
>> - whether mutating os.environ["PYTHONBREAKPOINT"] inside the process
>> affects future calls. I would find it quite surprising if it did;
>
> Maybe, but the effect would be essentially the same as setting 
> sys.breakpointhook during the execution of your program.
>
>> - whether the import happens immediately at startup, or is delayed
>> until the first call to breakpoint().
>
> I definitely think it should be delayed until first use.  You might never hit 
> the breakpoint() in which case you wouldn’t want to pay any penalty for even 
> checking the environment variable.  And once you *do* hit the breakpoint(), 
> you aren’t caring about performance then anyway.
>
> Both points could be addressed by caching the import after the first lookup, 
> but even there I don’t think it’s worth it.  I’d invoke KISS and just look it 
> up anew each time.  That way there’s no global/static state to worry about, 
> and the feature is as flexible as it can be.
>
>> If it's imported once at
>> startup, then this adds overhead to programs that set PYTHONBREAKPOINT
>> but don't use it, and if the envvar is set to some nonsense then you
>> get an error immediately instead of at the first call to breakpoint().
>
> That’s one case I can see being useful; report an error immediately upon 
> startup rather that when you hit breakpoint().  But even there, I’m not sure. 
>  What if you’ve got some kind of dynamic debugger discovery thing going on, 
> such that the callable isn’t available until its first use?

I don't think that's a big deal? Just do the discovery logic from
inside the callable itself, instead of using some kind of magical
attribute lookup hook.

>> These both seem like fairly minor differences to me, but maybe saving
>> that 30 ms or whatever of startup time is an important enough
>> optimization to justify the special case?
>
> I’m probably too steeped in the implementation, but it feels to me like not 
> just loading the envar callable on demand makes reasoning about and using 
> this more complicated.  I think for most uses though, it won’t really matter.

I don't mind breaking from convention if you have a good reason, I
just like it for PEPs to actually write down that reasoning :-)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 549: Instance Properties (aka: module properties)

2017-09-13 Thread Nathaniel Smith

On Wed, Sep 13, 2017 at 11:49 AM, Guido van Rossum  wrote:
>  > Why not adding both? Properties do have their uses as does __getattr__.
>
> In that case I would just add __getattr__ to module.c, and add a recipe or
> perhaps a utility module that implements a __getattr__ you can put into your
> module if you want @property support. That way you can have both but you
> only need a little bit of code in module.c to check for __getattr__ and call
> it when you'd otherwise raise AttributeError.

Unfortunately I don't think this works. If there's a @property object
present in the module's instance dict, then __getattribute__ will
return it directly instead of calling __getattr__.

(I guess for full property emulation you'd also need to override
__setattr__ and __dir__, but I don't know how important that is.)

We could consider letting modules overload __getattribute__ instead of
__getattr__, but I don't think this is viable either -- a key feature
of __getattr__ is that it doesn't add overhead to normal lookups. If
you implement deprecation warnings by overloading __getattribute__,
then it makes all your users slower, even the ones who never touch the
deprecated attributes. __getattr__ is much better than
__getattribute__ for this purpose.

Alternatively we can have a recipe that implements @property support
using __class__ assignment and overriding
__getattribute__/__setattr__/__dir__, so instead of 'from
module_helper.property_emulation import __getattr__' it'd be 'from
module_helper import enable_property_emulation;
enable_property_emulation(__name__)'. Still has the slowdown problem
but it would work.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-13 Thread Nathaniel Smith

On Sep 13, 2017 9:01 PM, "Nick Coghlan"  wrote:

On 14 September 2017 at 11:44, Eric Snow 
wrote:
>send(obj):
>
>Send the object to the receiving end of the channel.  Wait until
>the object is received.  If the channel does not support the
>object then TypeError is raised.  Currently only bytes are
>supported.  If the channel has been closed then EOFError is
>raised.

I still expect any form of object sharing to hinder your
per-interpreter GIL efforts, so restricting the initial implementation
to memoryview-only seems more future-proof to me.

I don't get it. With bytes, you can either share objects or copy them and
the user can't tell the difference, so you can change your mind later if
you want. But memoryviews require some kind of cross-interpreter strong
reference to keep the underlying buffer object alive. So if you want to
minimize object sharing, surely bytes are more future-proof.

> Handling an exception
> -
>
> ::
>
>interp = interpreters.create()
>try:
>interp.run("""if True:
>raise KeyError
>""")
>except KeyError:
>print("got the error from the subinterpreter")

As with the message passing through channels, I think you'll really
want to minimise any kind of implicit object sharing that may
interfere with future efforts to make the GIL truly an *interpreter*
lock, rather than the global process lock that it is currently.

One possible way to approach that would be to make the low level run()
API a more Go-style API rather than a Python-style one, and have it
return a (result, err) 2-tuple. "err.raise()" would then translate the
foreign interpreter's exception into a local interpreter exception,
but the *traceback* for that exception would be entirely within the
current interpreter.

It would also be reasonable to simply not return any value/exception from
run() at all, or maybe just a bool for whether there was an unhandled
exception. Any high level API is going to be injecting code on both sides
of the interpreter boundary anyway, so it can do whatever exception and
traceback translation it wants to.

> Reseting __main__
> -
>
> As proposed, every call to ``Interpreter.run()`` will execute in the
> namespace of the interpreter's existing ``__main__`` module.  This means
> that data persists there between ``run()`` calls.  Sometimes this isn't
> desireable and you want to execute in a fresh ``__main__``.  Also,
> you don't necessarily want to leak objects there that you aren't using
> any more.
>
> Solutions include:
>
> * a ``create()`` arg to indicate resetting ``__main__`` after each
>   ``run`` call
> * an ``Interpreter.reset_main`` flag to support opting in or out
>   after the fact
> * an ``Interpreter.reset_main()`` method to opt in when desired
>
> This isn't a critical feature initially.  It can wait until later
> if desirable.

I was going to note that you can already do this:

interp.run("globals().clear()")

However, that turns out to clear *too* much, since it also clobbers
all the __dunder__ attributes that the interpreter needs in a code
execution environment.

Either way, if you added this, I think it would make more sense as an
"importlib.util.reset_globals()" operation, rather than have it be
something specific to subinterpreters.

This is another point where the API could reasonably say that if you want
clean namespaces then you should do that yourself (e.g. by setting up your
own globals dict and using it to execute any post-bootstrap code).

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-14 Thread Nathaniel Smith

On Thu, Sep 14, 2017 at 5:44 PM, Nick Coghlan  wrote:
> On 14 September 2017 at 15:27, Nathaniel Smith  wrote:
>> I don't get it. With bytes, you can either share objects or copy them and
>> the user can't tell the difference, so you can change your mind later if you
>> want.
>> But memoryviews require some kind of cross-interpreter strong
>> reference to keep the underlying buffer object alive. So if you want to
>> minimize object sharing, surely bytes are more future-proof.
>
> Not really, because the only way to ensure object separation (i.e no
> refcounted objects accessible from multiple interpreters at once) with
> a bytes-based API would be to either:
>
> 1. Always copy (eliminating most of the low overhead communications
> benefits that subinterpreters may offer over multiple processes)
> 2. Make the bytes implementation more complicated by allowing multiple
> bytes objects to share the same underlying storage while presenting as
> distinct objects in different interpreters
> 3. Make the output on the receiving side not actually a bytes object,
> but instead a view onto memory owned by another object in a different
> interpreter (a "memory view", one might say)
>
> And yes, using memory views for this does mean defining either a
> subclass or a mediating object that not only keeps the originating
> object alive until the receiving memoryview is closed, but also
> retains a reference to the originating interpreter so that it can
> switch to it when it needs to manipulate the source object's refcount
> or call one of the buffer methods.
>
> Yury and I are fine with that, since it means that either the sender
> *or* the receiver can decide to copy the data (e.g. by calling
> bytes(obj) before sending, or bytes(view) after receiving), and in the
> meantime, the object holding the cross-interpreter view knows that it
> needs to switch interpreters (and hence acquire the sending
> interpreter's GIL) before doing anything with the source object.
>
> The reason we're OK with this is that it means that only reading a new
> message from a channel (i.e creating a cross-interpreter view) or
> discarding a previously read message (i.e. closing a cross-interpreter
> view) will be synchronisation points where the receiving interpreter
> necessarily needs to acquire the sending interpreter's GIL.
>
> By contrast, if we allow an actual bytes object to be shared, then
> either every INCREF or DECREF on that bytes object becomes a
> synchronisation point, or else we end up needing some kind of
> secondary per-interpreter refcount where the interpreter doesn't drop
> its shared reference to the original object in its source interpreter
> until the internal refcount in the borrowing interpreter drops to
> zero.

Ah, that makes more sense.

I am nervous that allowing arbitrary memoryviews gives a *little* more
power than we need or want. I like that the current API can reasonably
be emulated using subprocesses -- it opens up the door for backports,
compatibility support on language implementations that don't support
subinterpreters, direct benchmark comparisons between the two
implementation strategies, etc. But if we allow arbitrary memoryviews,
then this requires that you can take (a) an arbitrary object, not
specified ahead of time, and (b) provide two read-write views on it in
separate interpreters such that modifications made in one are
immediately visible in the other. Subprocesses can do one or the other
-- they can copy arbitrary data, and if you warn them ahead of time
when you allocate the buffer, they can do real zero-copy shared
memory. But the combination is really difficult.

It'd be one thing if this were like a key feature that gave
subinterpreters an advantage over subprocesses, but it seems really
unlikely to me that a library won't know ahead of time when it's
filling in a buffer to be transferred, and if anything it seems like
we'd rather not expose read-write shared mappings in any case. It's
extremely non-trivial to do right [1].

tl;dr: let's not rule out a useful implementation strategy based on a
feature we don't actually need.

One alternative would be your option (3) -- you can put bytes in and
get memoryviews out, and since bytes objects are immutable it's OK.

[1] https://en.wikipedia.org/wiki/Memory_model_(programming)

>>> Handling an exception
>>> -
>> It would also be reasonable to simply not return any value/exception from
>> run() at all, or maybe just a bool for whether there was an unhandled
>> exception. Any high level API is going to be injecting code on both sides of
>> the interpreter boundary anyway, so it can do whatever exception and
>> traceback translation it

Re: [Python-Dev] Evil reference cycles caused Exception.traceback

2017-09-18 Thread Nathaniel Smith

On Sep 18, 2017 07:58, "Antoine Pitrou"  wrote:

Le 18/09/2017 à 16:52, Guido van Rossum a écrit :
>
> In Python 2 the traceback was not part of the exception object because
> there was (originally) no cycle GC. In Python GC we changed the awkward
> interface to something more useful, because we could depend on GC. Why
> are we now trying to roll back this feature? We should just improve GC.
> (Or perhaps you shouldn't be raising so many exceptions. :-)

Improving the GC is obviously a good thing, but what heuristic would you
have in mind that may solve the issue at hand?

I read the whole thread and I'm not sure what the issue at hand is :-).
Obviously it's nice when the refcount system is able to implicitly clean
things up in a prompt and deterministic way, but there are already tools to
handle the cases where it doesn't (ResourceWarning, context managers, ...),
and the more we encourage people to implicitly rely on refcounting, the
harder it is to optimize the interpreter or use alternative language
implementations. Why are reference cycles a problem that needs solving?

Actually the bit that I find most confusing about Victor's story is, how
can a traceback frame keep a thread alive? Is the problem that "dangling
thread" warnings are being triggered by threads that are finished and dead
but their Thread objects are still allocated? Because if so then that seems
like a bug in the warnings mechanism; there's no harm in a dead Thread
hanging around until collected, and Victor may have wasted a day debugging
an issue that wasn't a problem in the first place...

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Evil reference cycles caused Exception.traceback

2017-09-18 Thread Nathaniel Smith

On Mon, Sep 18, 2017 at 9:50 AM, Antoine Pitrou  wrote:
> On Mon, 18 Sep 2017 09:42:45 -0700
> Nathaniel Smith  wrote:
>>
>> Obviously it's nice when the refcount system is able to implicitly clean
>> things up in a prompt and deterministic way, but there are already tools to
>> handle the cases where it doesn't (ResourceWarning, context managers, ...),
>> and the more we encourage people to implicitly rely on refcounting, [...]
>
> The thing is, we don't need to encourage them.  Having objects disposed
> of when the last visible reference vanishes is a pretty common
> expectation people have when using CPython.
>
>> Why are reference cycles a problem that needs solving?
>
> Because sometimes they are holding up costly resources in memory when
> people don't expect them to.  Such as large Numpy arrays :-)

Do we have any reason to believe that this is actually happening on a
regular basis though?

If it is then it might make sense to look at the cycle collection
heuristics; IIRC they're based on a fairly naive count of how many
allocations have been made, without regard to their size.

> And, no, there are no obvious ways to fix for users.  gc.collect()
> is much too costly to be invoked on a regular basis.
>
>> Because if so then that seems
>> like a bug in the warnings mechanism; there's no harm in a dead Thread
>> hanging around until collected, and Victor may have wasted a day debugging
>> an issue that wasn't a problem in the first place...
>
> Yes, I think Victor is getting a bit overboard with the so-called
> "dangling thread" issue.  But the underlying issue (that heavyweight
> resources can be inadvertently held up in memory up just because some
> unrelated exception was caught and silenced along the way) is a real
> one.

Simply catching and silencing exceptions doesn't create any loops -- if you do

try:
raise ValueError
except ValueError as exc:
raise exc

then there's no loop, because the 'exc' local gets cleared as soon as
you exit the except: block. The issue that Victor ran into with
socket.create_connection is a special case where that function saves
off the caught exception to use later.

If someone wanted to replace socket.create_connection's 'raise err' with

try:
raise err
finally:
del err

then I guess that would be pretty harmless...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Evil reference cycles caused Exception.traceback

2017-09-18 Thread Nathaniel Smith

On Mon, Sep 18, 2017 at 10:59 AM, Antoine Pitrou  wrote:
> Le 18/09/2017 à 19:53, Nathaniel Smith a écrit :
>>>
>>>> Why are reference cycles a problem that needs solving?
>>>
>>> Because sometimes they are holding up costly resources in memory when
>>> people don't expect them to.  Such as large Numpy arrays :-)
>>
>> Do we have any reason to believe that this is actually happening on a
>> regular basis though?
>
> Define "regular" :-)  We did get some reports on dask/distributed about it.

Caused by uncollected cycles involving tracebacks? I looked here:

https://github.com/dask/distributed/issues?utf8=%E2%9C%93&q=is%3Aissue%20memory%20leak

and saw some issues with cycles causing delayed collection (e.g. #956)
or the classic memory leak problem of explicitly holding onto data you
don't need any more (e.g. #1209, bpo-29861), but nothing involving
traceback cycles. It was just a quick skim though.

>> If it is then it might make sense to look at the cycle collection
>> heuristics; IIRC they're based on a fairly naive count of how many
>> allocations have been made, without regard to their size.
>
> Yes... But just because a lot of memory has been allocated isn't a good
> enough heuristic to launch a GC collection.

I'm not an expert on GC at all, but intuitively it sure seems like
allocation size might be a useful piece of information to feed into a
heuristic. Our current heuristic is just, run a small collection after
every 700 allocations, run a larger collection after 10 smaller
collections.

> What if that memory is
> gonna stay allocated for a long time?  Then you're frequently launching
> GC runs for no tangible result except more CPU consumption and frequent
> pauses.

Every heuristic has problematic cases, that's why we call it a
heuristic :-). But somehow every other GC language manages to do
well-enough without refcounting... I think they mostly have more
sophisticated heuristics than CPython, though. Off the top of my head,
I know PyPy's heuristic involves the ratio of the size of nursery
objects versus the size of the heap, and JVMs do much cleverer things
like auto-tuning nursery size to make empirical pause times match some
target.

> Perhaps we could special-case tracebacks somehow, flag when a traceback
> remains alive after the implicit "del" clause at the end of an "except"
> block, then maintain some kind of linked list of the flagged tracebacks
> and launch specialized GC runs to find cycles accross that collection.
> That sounds quite involved, though.

We already keep a list of recently allocated objects and have a
specialized GC that runs across just that collection. That's what
generational GC is :-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-25 Thread Nathaniel Smith

On Sat, Sep 23, 2017 at 2:45 AM, Antoine Pitrou  wrote:
>> As to "running_interpreters()" and "idle_interpreters()", I'm not sure
>> what the benefit would be.  You can compose either list manually with
>> a simple comprehension:
>>
>> [interp for interp in interpreters.list_all() if interp.is_running()]
>> [interp for interp in interpreters.list_all() if not interp.is_running()]
>
> There is a inherit race condition in doing that, at least if
> interpreters are running in multiple threads (which I assume is going
> to be the overly dominant usage model).  That is why I'm proposing all
> three variants.

There's a race condition no matter what the API looks like -- having a
dedicated running_interpreters() lets you guarantee that the returned
list describes the set of interpreters that were running at some
moment in time, but you don't know when that moment was and by the
time you get the list, it's already out-of-date. So this doesn't seem
very useful. OTOH if we think that invariants like this are useful, we
might also want to guarantee that calling running_interpreters() and
idle_interpreters() gives two lists such that each interpreter appears
in exactly one of them, but that's impossible with this API; it'd
require a single function that returns both lists.

What problem are you trying to solve?

>> Likewise,
>> queue.Queue.send() supports blocking, in addition to providing a
>> put_nowait() method.
>
> queue.Queue.put() never blocks in the usual case (*), which is of an
> unbounded queue.  Only bounded queues (created with an explicit
> non-zero max_size parameter) can block in Queue.put().
>
> (*) and therefore also never deadlocks :-)

Unbounded queues also introduce unbounded latency and memory usage in
realistic situations. (E.g. a producer/consumer setup where the
producer runs faster than the consumer.) There's a reason why sockets
always have bounded buffers -- it's sometimes painful, but the pain is
intrinsic to building distributed systems, and unbounded buffers just
paper over it.

>> > send() blocking until someone else calls recv() is not only bad for
>> > performance,
>>
>> What is the performance problem?
>
> Intuitively, there must be some kind of context switch (interpreter
> switch?) at each send() call to let the other end receive the data,
> since you don't have any internal buffering.

Technically you just need the other end to wake up at some time in
between any two calls to send(), and if there's no GIL then this
doesn't necessarily require a context switch.

> Also, suddenly an interpreter's ability to exploit CPU time is
> dependent on another interpreter's ability to consume data in a timely
> manner (what if the other interpreter is e.g. stuck on some disk I/O?).
> IMHO it would be better not to have such coupling.

A small buffer probably is useful in some cases, yeah -- basically
enough to smooth out scheduler jitter.

>> > it also increases the likelihood of deadlocks.
>>
>> How much of a problem will deadlocks be in practice?
>
> I expect more often than expected, in complex systems :-)  For example,
> you could have a recv() loop that also from time to time send()s some
> data on another queue, depending on what is received.  But if that
> send()'s recipient also has the same structure (a recv() loop which
> send()s from time to time), then it's easy to imagine to two getting in
> a deadlock.

You kind of want to be able to create deadlocks, since the alternative
is processes that can't coordinate and end up stuck in livelocks or
with unbounded memory use etc.

>> I'm not sure I understand your concern here.  Perhaps I used the word
>> "sharing" too ambiguously?  By "sharing" I mean that the two actors
>> have read access to something that at least one of them can modify.
>> If they both only have read-only access then it's effectively the same
>> as if they are not sharing.
>
> Right.  What I mean is that you *can* share very simple "data" under
> the form of synchronization primitives.  You may want to synchronize
> your interpreters even they don't share user-visible memory areas.  The
> point of synchronization is not only to avoid memory corruption but
> also to regulate and orchestrate processing amongst multiple workers
> (for example processes or interpreters).  For example, a semaphore is
> an easy way to implement "I want no more than N workers to do this
> thing at the same time" ("this thing" can be something such as disk
> I/O).

It's fairly reasonable to implement a mutex using a CSP-style
unbuffered channel (send = acquire, receive = release). And the same
trick turns a channel with a fixed-size buffer into a bounded
semaphore. It won't be as efficient as a modern specialized mutex
implementation, of course, but it's workable.

Unfortunately while technically you can construct a buffered channel
out of an unbuffered channel, the construction's pretty unreasonable
(it needs two dedicated threads per channel).

-n

-- 
Nathaniel J. Smith -- https://

Re: [Python-Dev] Investigating time for `import requests`

2017-10-01 Thread Nathaniel Smith

On Sun, Oct 1, 2017 at 7:04 PM, INADA Naoki  wrote:
> 4. http.client
>
> import time:  1376 |   2448 |   email.header
> ...
> import time:  1469 |   7791 |   email.utils
> import time:   408 |  10646 | email._policybase
> import time:   939 |  12210 |   email.feedparser
> import time:   322 |  12720 | email.parser
> ...
> import time:   599 |   1361 | email.message
> import time:  1162 |  16694 |   http.client
>
> email.parser has very large import tree.
> But I don't know how to break the tree.

There is some work to get urllib3/requests to stop using http.client,
though it's not clear if/when it will actually happen:
https://github.com/shazow/urllib3/pull/1068

> Another major slowness comes from compiling regular expression.
> I think we can increase cache size of `re.compile` and use ondemand cached
> compiling (e.g. `re.match()`),
> instead of "compile at import time" in many modules.

In principle re.compile() itself could be made lazy -- return a
regular exception object that just holds the string, and then compiles
and caches it the first time it's used. Might be tricky to do in a
backwards compatibility way if it moves detection of invalid regexes
from compile time to use time, but it could be an opt-in flag.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-15 Thread Nathaniel Smith

On Sun, Oct 15, 2017 at 6:33 PM, Yury Selivanov  wrote:
> Hi,
>
> It looks like the discussion about the execution context became
> extremely hard to follow.  There are many opinions on how the spec for
> generators should look like.  What seems to be "natural"
> behaviour/example to one, seems to be completely unreasonable to other
> people.  Recent emails from Guido indicate that he doesn't want to
> implement execution contexts for generators (at least in 3.7).
>
> In another thread Guido said this: "... Because coroutines and
> generators are similar under the covers, Yury demonstrated the issue
> with generators instead of coroutines (which are unfamiliar to many
> people). And then somehow we got hung up about fixing the problem in
> the example."
>
> And Guido is right.  My initial motivation to write PEP 550 was to
> solve my own pain point, have a solution for async code.
> 'threading.local' is completely unusable there, but complex code bases
> demand a working solution.  I thought that because coroutines and
> generators are so similar under the hood, I can design a simple
> solution that will cover all edge cases.  Turns out it is not possible
> to do it in one pass.
>
> Therefore, in order to make some progress, I propose to split the
> problem in half:
>
> Stage 1. A new execution context PEP to solve the problem *just for
> async code*.  The PEP will target Python 3.7 and completely ignore
> synchronous generators and asynchronous generators.  It will be based
> on PEP 550 v1 (no chained lookups, immutable mapping or CoW as an
> optimization) and borrow some good API decisions from PEP 550 v3+
> (contextvars module, ContextVar class).  The API (and C-API) will be
> designed to be future proof and ultimately allow transition to the
> stage 2.

If you want to ignore generators/async generators, then I think you
don't even want PEP 550 v1, you just want something like a
{set,get}_context_state API that lets you access the ThreadState's
context dict (or rather, an opaque ContextState object that holds the
context dict), and then task schedulers can call them at appropriate
moments.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-15 Thread Nathaniel Smith

On Sun, Oct 15, 2017 at 10:10 PM, Guido van Rossum  wrote:
> On Sun, Oct 15, 2017 at 8:17 PM, Nathaniel Smith  wrote:
>>
>> On Sun, Oct 15, 2017 at 6:33 PM, Yury Selivanov 
>> wrote:
>> > Stage 1. A new execution context PEP to solve the problem *just for
>> > async code*.  The PEP will target Python 3.7 and completely ignore
>> > synchronous generators and asynchronous generators.  It will be based
>> > on PEP 550 v1 (no chained lookups, immutable mapping or CoW as an
>> > optimization) and borrow some good API decisions from PEP 550 v3+
>> > (contextvars module, ContextVar class).  The API (and C-API) will be
>> > designed to be future proof and ultimately allow transition to the
>> > stage 2.
>>
>> If you want to ignore generators/async generators, then I think you
>> don't even want PEP 550 v1, you just want something like a
>> {set,get}_context_state API that lets you access the ThreadState's
>> context dict (or rather, an opaque ContextState object that holds the
>> context dict), and then task schedulers can call them at appropriate
>> moments.
>
>
> Yes, that's what I meant by "ignoring generators". And I'd like there to be
> a "current context" that's a per-thread MutableMapping with ContextVar keys.
> Maybe there's not much more to it apart from naming the APIs for getting and
> setting it? To be clear, I am fine with this being a specific subtype of
> MutableMapping. But I don't see much benefit in making it more abstract than
> that.

We don't need it to be abstract (it's fine to have a single concrete
mapping type that we always use internally), but I think we do want it
to be opaque (instead of exposing the MutableMapping interface, the
only way to get/set specific values should be through the ContextVar
interface). The advantages are:

- This allows C level caching of values in ContextVar objects (in
particular, funneling mutations through a limited API makes cache
invalidation *much* easier)

- It gives us flexibility to change the underlying data structure
without breaking API, or for different implementations to make
different choices -- in particular, it's not clear whether a dict or
HAMT is better, and it's not clear whether a regular dict or
WeakKeyDict is better.

The first point (caching) I think is the really compelling one: in
practice decimal and numpy are already using tricky caching code to
reduce the overhead of accessing the ThreadState dict, and this gets
even trickier with context-local state which has more cache
invalidation points, so if we don't do this in the interpreter then it
could actually become a blocker for adoption. OTOH it's easy for the
interpreter itself to do this caching, and it makes everyone faster.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-16 Thread Nathaniel Smith

On Mon, Oct 16, 2017 at 11:12 AM, Ethan Furman  wrote:
> What would be really nice is to have attribute access like thread locals.
> Instead of working with individual ContextVars you grab the LocalContext and
> access the vars as attributes.  I don't recall reading in the PEP why this
> is a bad idea.

You're mixing up levels -- the way threading.local objects work is
that there's one big dict that's hidden inside the interpreter (in the
ThreadState), and it holds a separate little dict for each
threading.local. The dict holding ContextVars is similar to the big
dict; a threading.local itself is like a ContextVar that holds a dict.
(And the reason it's this way is that it's easy to build either
version on top of the other, and we did some survey of threading.local
usage and the ContextVar style usage was simpler in the majority of
cases.)

For threading.local there's no way to get at the big dict at all from
Python; it's hidden inside the C APIs and threading internals. I'm
guessing you've never missed this :-). For ContextVars we can't hide
it that much, because async frameworks need to be able to swap the
current dict when switching tasks and clone it when starting a new
task, but those are the only absolutely necessary operations.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-16 Thread Nathaniel Smith

On Mon, Oct 16, 2017 at 8:49 AM, Guido van Rossum  wrote:
> On Sun, Oct 15, 2017 at 10:26 PM, Nathaniel Smith  wrote:
>>
>> On Sun, Oct 15, 2017 at 10:10 PM, Guido van Rossum 
>> wrote:
>> > Yes, that's what I meant by "ignoring generators". And I'd like there to
>> > be
>> > a "current context" that's a per-thread MutableMapping with ContextVar
>> > keys.
>> > Maybe there's not much more to it apart from naming the APIs for getting
>> > and
>> > setting it? To be clear, I am fine with this being a specific subtype of
>> > MutableMapping. But I don't see much benefit in making it more abstract
>> > than
>> > that.
>>
>> We don't need it to be abstract (it's fine to have a single concrete
>> mapping type that we always use internally), but I think we do want it
>> to be opaque (instead of exposing the MutableMapping interface, the
>> only way to get/set specific values should be through the ContextVar
>> interface). The advantages are:
>>
>> - This allows C level caching of values in ContextVar objects (in
>> particular, funneling mutations through a limited API makes cache
>> invalidation *much* easier)
>
>
> Well the MutableMapping could still be a proxy or something that invalidates
> the cache when mutated. That's why I said it should be a single concrete
> mapping type. (It also doesn't have to derive from MutableMapping -- it's
> sufficient for it to be a duck type for one, or perhaps some Python-level
> code could `register()` it.

MutableMapping is just a really complicated interface -- you have to
deal with iterator invalidation and popitem and implementing view
classes and all that. It seems like a lot of code for a feature that
no-one seems to worry about missing right now. (In fact, I suspect the
extra code required to implement the full MutableMapping interface on
top of a basic HAMT type is larger than the extra code to implement
the current PEP 550 draft's chaining semantics on top of this proposal
for a minimal PEP 550.)

What do you think of something like:

class Context:
def __init__(self, /, init: MutableMapping[ContextVar,object] = {}):
...

def as_dict(self) -> Dict[ContextVar, object]:
"Returns a snapshot of the internal state."

def copy(self) -> Context:
"Equivalent to (but maybe faster than) Context(self.as_dict())."

I like the idea of making it possible to set up arbitrary Contexts and
introspect them, because sometimes you do need to debug weird issues
or do some wacky stuff deep in the guts of a coroutine scheduler, but
this would give us that without implementing MutableMapping's 17
methods and 7 helper classes.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-17 Thread Nathaniel Smith

On Oct 17, 2017 11:25 AM, "Guido van Rossum"  wrote:


In short, I really don't think there's a need for context variables to be
faster than instance variables.


There really is: currently the cost of looking up a thread local through
the C API is a dict lookup, which is faster than instance variable lookup,
and decimal and numpy have both found that that's already too expensive.

Or maybe you're just talking about the speed when the cache misses, in
which case never mind :-).

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 561: Distributing and Packaging Type Information

2017-10-27 Thread Nathaniel Smith

On Thu, Oct 26, 2017 at 3:42 PM, Ethan Smith  wrote:
> However, the stubs may be put in a sub-folder
> of the Python sources, with the same name the ``*.py`` files are in. For
> example, the ``flyingcircus`` package would have its stubs in the folder
> ``flyingcircus/flyingcircus/``. This path is chosen so that if stubs are
> not found in ``flyingcircus/`` the type checker may treat the subdirectory as
> a normal package.

I admit that I find this aesthetically unpleasant. Wouldn't something
like __typestubs__/ be a more Pythonic name? (And also avoid potential
name clashes, e.g. my async_generator package has a top-level export
called async_generator; normally you do 'from async_generator import
async_generator'. I think that might cause problems if I created an
async_generator/async_generator/ directory, especially post-PEP 420.)

I also don't understand the given rationale -- it sounds like you want
to be able say well, if ${SOME_DIR_ON_PYTHONPATH}/flyingcircus/
doesn't contain stubs, then just stick the
${SOME_DIR_ON_PYTHONPATH}/flyingcircus/ directory *itself* onto
PYTHONPATH, and then try again. But that's clearly the wrong thing,
because then you'll also be adding a bunch of other random junk into
that directory into the top-level namespace. For example, suddenly the
flyingcircus.summarise_proust module has become a top-level
summarise_proust package. I must be misunderstanding something?

> Type Checker Module Resolution Order
> 
>
> The following is the order that type checkers supporting this PEP should
> resolve modules containing type information:
>
> 1. User code - the files the type checker is running on.
>
> 2. Stubs or Python source manually put in the beginning of the path. Type
>checkers should provide this to allow the user complete control of which
>stubs to use, and patch broken stubs/inline types from packages.
>
> 3. Third party stub packages - these packages can supersede the installed
>untyped packages. They can be found at ``pkg-stubs`` for package ``pkg``,
>however it is encouraged to check the package's metadata using packaging
>query APIs such as ``pkg_resources`` to assure that the package is meant
>for type checking, and is compatible with the installed version.

Am I right that this means you need to be able to map from import
names to distribution names? I.e., if you see 'import foo', you need
to figure out which *.dist-info directory contains metadata for the
'foo' package? How do you plan to do this?

The problem is that technically, import names and distribution names
are totally unrelated namespaces -- for example, the '_pytest' package
comes from the 'pytest' distribution, the 'pylab' package comes from
'matplotlib', and 'pip install scikit-learn' gives you a package
imported as 'sklearn'. Namespace packages are also challenging,
because a single top-level package might actually be spread across
multiple distributions.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-11-05 Thread Nathaniel Smith

On Nov 5, 2017 2:41 PM, "Paul Ganssle"  wrote:

I think the question of whether any specific implementation of dict could
be made faster for a given architecture or even that the trade-offs made by
CPython are generally the right ones is kinda beside the point. It's
certainly feasible that an implementation that does not preserve ordering
could be better for some implementation of Python, and the question is
really how much is gained by changing the language semantics in such a way
as to cut off that possibility.


The language definition is not nothing, but I think it's easy to
overestimate its importance. CPython does in practice provide ordering
guarantees for dicts, and this solves a whole bunch of pain points: it
makes json roundtripping work better, it gives ordered kwargs, it makes it
possible for metaclasses to see the order class items were defined, etc.
And we got all these goodies for better-than-free: the new dict is faster
and uses less memory. So it seems very unlikely that CPython is going to
revert this change in the foreseeable future, and that means people will
write code that depends on this, and that means in practice reverting it
will become impossible due to backcompat and it will be important for other
interpreters to implement, regardless of what the language definition says.

That said, there are real benefits to putting this in the spec. Given that
we're not going to get rid of it, we might as well reward the minority of
programmers who are conscientious about following the spec by letting them
use it too. And there were multiple PEPs that went away when this was
merged; no one wants to resurrect them just for hypothetical future
implementations that may never exist. And putting it in the spec will mean
that we can stop having this argument over and over with the same points
rehashed for those who missed the last one. (This isn't aimed at you or
anything; it's not your fault you don't know all these arguments off the
top of your head, because how would you? But it is a reality of mailing
list dynamics that rehashing this kind of thing sucks up energy without
producing much.)

MicroPython deviates from the language spec in lots of ways. Hopefully this
won't need to be another one, but it won't be the end of the world if it is.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: go back to enabling DeprecationWarning by default

2017-11-06 Thread Nathaniel Smith

On Sun, Nov 5, 2017 at 9:38 PM, Nick Coghlan  wrote:
> We've been running the current experiment for 7 years, and the main
> observable outcome has been folks getting surprised by breaking
> changes in CPython releases, especially folks that primarily use
> Python interactively (e.g. for data analysis), or as a scripting
> engine (e.g. for systems administration).

It's also caused lots of projects to switch to using their own ad hoc
warning types for deprecations, e.g. off the top of my head:

https://github.com/matplotlib/matplotlib/blob/6c51037864f9a4ca816b68ede78207f7ecec656c/lib/matplotlib/cbook/deprecation.py#L5
https://github.com/numpy/numpy/blob/d75b86c0c49f7eb3ec60564c2e23b3ff237082a2/numpy/_globals.py#L45
https://github.com/python-trio/trio/blob/f50aa8e00c29c7f2953b7bad38afc620772dca74/trio/_deprecate.py#L16

So in some ways the change has actually made it *harder* for end-user
applications/scripts to hide all deprecation warnings, because for
each package you use you have to somehow figure out which
idiosyncratic type it uses, and filter them each separately.

(In any changes though please do keep in mind that Python itself is
not the only one issuing deprecation warnings. I'm thinking in
particular of the filter-based-on-Python-version idea. Maybe you could
have subclasses like Py35DeprecationWarning and filter on those?)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: go back to enabling DeprecationWarning by default

2017-11-07 Thread Nathaniel Smith

On Nov 7, 2017 06:24, "Nick Coghlan"  wrote:

On 7 November 2017 at 19:30, Paul Moore  wrote:
> On 7 November 2017 at 04:09, Nick Coghlan  wrote:
>> Given the status quo, how do educators learn that the examples they're
>> teaching to their students are using deprecated APIs?
>
> By reading the documentation on what they are teaching, and by testing
> their examples with new versions with deprecation warnings turned on?
> Better than having warnings appear the first time they run a course
> with a new version of Python, surely?
>
> I understand the "but no-one actually does this" argument. And I
> understand that breakage as a result is worse than a few warnings. But
> enabling deprecation warnings by default feels to me like favouring
> the developer over the end user. I remember before the current
> behaviour was enabled and it was *immensely* frustrating to try to use
> 3rd party code and get a load of warnings. The only options were:
>
> 1. Report the bug - usually not much help, as I want to run the
> program *now*, not when a new release is made.
> 2. Fix the code (and ideally submit a PR upstream) - I want to *use*
> the program, not debug it.
> 3. Find the right setting/environment variable, and tweak how I call
> the program to apply it - which doesn't fix the root cause, it's just
> a workaround.

Yes, this is why I've come around to the view that we need to come up
with a viable definition of "third party code" and leave deprecation
warnings triggered by that code disabled by default.

My suggestion for that definition is to have the *default* meaning of
"third party code" be "everything that isn't __main__".

That way, if you get a deprecation warning at the REPL, it's
necessarily because of something *you* did, not because of something a
library you called did. Ditto for single file scripts.

IPython actually made this change a few years ago; since 2015 I think it
has shown DeprecationWarnings by default if they're triggered by __main__.

It's helpful but I haven't noticed it eliminating this problem. One
limitation in particular is that it requires that the warnings are
correctly attributed to the code that triggered them, which means that
whoever is issuing the warning has to set the stacklevel= correctly, and
most people don't. (The default of stacklevel=1 is always wrong for
DeprecationWarning.) Also, IIRC it's actually impossible to set the
stacklevel= correctly when you're deprecating a whole module and issue the
warning at import time, because you need to know how many stack frames the
import system uses.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-11-07 Thread Nathaniel Smith

On Nov 7, 2017 12:02 PM, "Barry Warsaw"  wrote:

On Nov 7, 2017, at 09:39, Paul Sokolovsky  wrote:

> So, the problem is that there's no "Python language spec”.

There is a language specification: https://docs.python.org/3/refe
rence/index.html

But there are still corners that are undocumented, or topics that are
deliberately left as implementation details.


Also, specs don't mean that much unless there are multiple implementations
in widespread use. In JS the spec matters because it describes the common
subset of the language you can expect to see across browsers, and lets the
browser vendors coordinate on future changes. Since users actually target
and test against multiple implementations, this is useful. In python,
CPython's dominance means that most libraries are written against CPython's
behavior instead of the spec, and alternative implementations generally
don't care about the spec, they care about whether they can run the code
their users want to run. So PyPy has found that for their purposes, the
python spec includes all kinds of obscure internal implementation details
like CPython's static type/heap type distinction, the exact tricks CPython
uses to optimize local variable access, the CPython C API, etc. The Pyston
devs found that for their purposes, refcounting actually was a mandatory
part of the python language. Jython, MicroPython, etc make a different set
of compatibility tradeoffs again.

I'm not saying the spec is useless, but it's not magic either. It only
matters to the extent that it solves some problem for people.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The current dict is not an "OrderedDict"

2017-11-09 Thread Nathaniel Smith

On Thu, Nov 9, 2017 at 1:46 PM, Cameron Simpson  wrote:
> On 08Nov2017 10:28, Antoine Pitrou  wrote:
>>
>> On Wed, 8 Nov 2017 13:07:12 +1000
>> Nick Coghlan  wrote:
>>>
>>> On 8 November 2017 at 07:19, Evpok Padding 
>>> wrote:
>>> > On 7 November 2017 at 21:47, Chris Barker 
>>> > wrote:
>>> >> if dict order is preserved in cPython , people WILL count on it!
>>> >
>>> > I won't, and if people do and their code break, they'll have only
>>> > themselves
>>> > to blame.
>>> > Also, what proof do you have of that besides anecdotal evidence ?
>>>
>>> ~27 calendar years of anecdotal evidence across a multitude of CPython
>>> API behaviours (as well as API usage in other projects).
>>>
>>> Other implementation developers don't say "CPython's runtime behaviour
>>> is the real Python specification" for the fun of it - they say it
>>> because "my code works on CPython, but it does the wrong thing on your
>>> interpreter, so I'm going to stick with CPython" is a real barrier to
>>> end user adoption, no matter what the language specification says.
>>
>>
>> Yet, PyPy has no reference counting, and it doesn't seem to be a cause
>> of concern.  Broken code is fixed along the way, when people notice.
>
>
> I'd expect that this may be because that would merely to cause temporary
> memory leakage or differently timed running of __del__ actions.  Neither of
> which normally affects semantics critical to the end result of most
> programs.

It's actually a major problem when porting apps to PyPy. The common
case is servers that crash because they rely on the GC to close file
descriptors, and then run out of file descriptors. IIRC this is the
major obstacle to supporting OpenStack-on-PyPy. NumPy is currently
going through the process to deprecate and replace a core bit of API
[1] because it turns out to assume a refcounting GC.

-n

[1] See:
  https://github.com/numpy/numpy/pull/9639
  https://mail.python.org/pipermail/numpy-discussion/2017-November/077367.html

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [python-committers] Enabling depreciation warnings feature code cutoff

2017-11-09 Thread Nathaniel Smith

On Nov 8, 2017 16:12, "Nick Coghlan"  wrote:

On 9 November 2017 at 07:46, Antoine Pitrou  wrote:
>
> Le 08/11/2017 à 22:43, Nick Coghlan a écrit :
>>
>> However, between them, the following two guidelines should provide
>> pretty good deprecation warning coverage for the world's Python code:
>>
>> 1. If it's in __main__, it will emit deprecation warnings at runtime
>> 2. If it's not in __main__, it should have a test suite
>
> Nick, have you actually read the discussion and the complaints people
> had with the current situation?  Most of them *don't* specifically talk
> about __main__ scripts.

I have, and I've also re-read the discussions regarding why the
default got changed in the first place.

Behaviour up until 2.6 & 3.1:

once::DeprecationWarning

Behaviour since 2.7 & 3.2:

ignore::DeprecationWarning

With test runners overriding the default filters to set it back to
"once::DeprecationWarning".

Is this intended to be a description of the current state of affairs?
Because I've never encountered a test runner that does this... Which
runners are you thinking of?

The rationale for that change was so that end users of applications
that merely happened to be written in Python wouldn't see deprecation
warnings when Linux distros (or the end user) updated to a new Python
version. It had the downside that you had to remember to opt-in to
deprecation warnings in order to see them, which is a problem if you
mostly use Python for ad hoc personal scripting.

Proposed behaviour for Python 3.7+:

once::DeprecationWarning:__main__
ignore::DeprecationWarning

With test runners still overriding the default filters to set them
back to "once::DeprecationWarning".

This is a partial reversion back to the pre-2.7 behaviour, focused
specifically on interactive use and ad hoc personal scripting. For ad
hoc *distributed* scripting, the changed default encourages upgrading
from single-file scripts to the zipapp model, and then minimising the
amount of code that runs directly in __main__.py.

I expect this will be a sufficient change to solve the specific
problem I'm personally concerned by, so I'm no longer inclined to
argue for anything more complicated. Other folks may have other
concerns that this tweak to the default filters doesn't address - they
can continue to build their case for more complex options using this
as the new baseline behaviour.

I think most people's concern is that we've gotten into a state where
DeprecationWarning's are largely useless in practice, because no one sees
them. Effectively the norm now is that developers (both the Python core
team and downstream libraries) think they're following some sensible
deprecation cycle, but often they're actually making changes without any
warning, just they wait a year to do it. It's not clear why we're bothering
through multiple releases -- which adds major overhead -- if in practice we
aren't going to actually warn most people. Enabling them for another 1% of
code doesn't really address this.

As I mentioned above, it's also having the paradoxical effect of making it
so that end-users are *more* likely to see deprecation warnings, since
major libraries are giving up on using DeprecationWarning. Most recently it
looks like pyca/cryptography is going to switch, partly as a result of this
thread:
  https://github.com/pyca/cryptography/pull/4014

Some more ideas to throw out there:

- if an envvar CI=true is set, then by default make deprecation warnings
into errors. (This is an informal standard that lots of CI systems use.
Error instead of "once" because most people don't look at CI output at all
unless there's an error.)

- provide some mechanism that makes it easy to have a deprecation warning
that starts out as invisible, but then becomes visible as you get closer to
the switchover point. (E.g. CPython might make the deprecation warnings
that it issues be invisible in 3.x.0 and 3.x.1 but become visible in
3.x.2+.) Maybe:

# in warnings.py
def deprecation_warning(library_version, visible_in_version,
change_in_version, msg, stacklevel):
...

Then a call like:

  deprecation_warning(my_library.__version__, "1.3", "1.4", "This function
is deprecated", 2)

issues an InvisibleDeprecationWarning if my_library.__version__ < 1.3, and
a VisibleDeprecationWarning otherwise.

(The stacklevel argument is mandatory because the usual default of 1 is
always wrong for deprecation warnings.)

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: go back to enabling DeprecationWarning by default

2017-11-10 Thread Nathaniel Smith

On Tue, Nov 7, 2017 at 8:45 AM, Nathaniel Smith  wrote:
> Also, IIRC it's actually impossible to set the stacklevel= correctly when
> you're deprecating a whole module and issue the warning at import time,
> because you need to know how many stack frames the import system uses.

Doh, I didn't remember correctly. Actually Brett fixed this in 3.5:
https://bugs.python.org/issue24305

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [python-committers] Enabling depreciation warnings feature code cutoff

2017-11-11 Thread Nathaniel Smith

On Fri, Nov 10, 2017 at 11:34 PM, Brett Cannon  wrote:
> On Thu, Nov 9, 2017, 17:33 Nathaniel Smith,  wrote:
>> - if an envvar CI=true is set, then by default make deprecation warnings
>> into errors. (This is an informal standard that lots of CI systems use.
>> Error instead of "once" because most people don't look at CI output at all
>> unless there's an error.)
>
> One problem with that is I don't want e.g. mypy to start spewing out
> warnings while checking my code. That's why I like Victor's idea of a -X
> option that also flips on other test/debug features. Yes, this would also
> trigger for test runners, but that's at least a smaller amount of affected
> code.

Ah, yeah, you're right -- often CI systems use Python programs for
infrastructure, beyond the actual code under test. pip is maybe a more
obvious example than mypy -- we probably don't want pip to stop
working in CI runs just because it happens to use a deprecated API
somewhere :-). So this idea doesn't work.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 565: Show DeprecationWarning in main

2017-11-12 Thread Nathaniel Smith

On Sun, Nov 12, 2017 at 1:24 AM, Nick Coghlan  wrote:
> This change will lead to DeprecationWarning being displayed by default for:
>
> * code executed directly at the interactive prompt
> * code executed directly as part of a single-file script

Technically it's orthogonal, but if you're trying to get better
warnings in the REPL, then you might also want to look at:

https://bugs.python.org/issue1539925
https://github.com/ipython/ipython/issues/6611

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 561 rework

2017-11-12 Thread Nathaniel Smith

On Sun, Nov 12, 2017 at 11:21 AM, Ethan Smith  wrote:
>
>
> On Sun, Nov 12, 2017 at 9:53 AM, Jelle Zijlstra 
> wrote:
>>
>> 2017-11-12 3:40 GMT-08:00 Ethan Smith :
>>> The name of the stub
>>> package
>>> MUST follow the scheme ``pkg_stubs`` for type stubs for the package named
>>> ``pkg``. The normal resolution order of checking ``*.pyi`` before
>>> ``*.py``
>>> will be maintained.
>>
>> This is very minor, but what do you think of using "pkg-stubs" instead of
>> "pkg_stubs" (using a hyphen rather than an underscore)? This would make the
>> name illegal to import as a normal Python package, which makes it clear that
>> it's not a normal package. Also, there could be real packages named
>> "_stubs".
>
> I suppose this makes sense. I checked PyPI and as of a few weeks ago there
> were no packages with the name pattern, but I like the idea of making it
> explicitly non-runtime importable. I cannot think of any reason not to do
> it, and the avoidance of confusion about the package being importable is a
> benefit. I will make the change with my next round of edits.

PyPI doesn't distinguish between the names 'foo-stubs' and 'foo_stubs'
-- they get normalized together. So even if you use 'foo-stubs' as the
directory name on sys.path to avoid collisions at import time, it
still won't allow someone to distribute a separate 'foo_stubs' package
on PyPI.

If you do go with a fixed naming convention like this, the PEP should
probably also instruct the PyPI maintainers that whoever owns 'foo'
automatically has the right to control the name 'foo-stubs' as well.
Or maybe some tweak to PEP 541 is needed.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Standardise the AST (Re: PEP 563: Postponed Evaluation of Annotations)

2017-11-13 Thread Nathaniel Smith

Can you give any examples of problems caused by the ast not being
standardized? The original motivation of being able to distinguish between
  foo(x: int)
  foo(x: "int")
isn't very compelling – it's not clear it's a problem in the first place,
and even if it is then all we need is some kind of boolean flag, not an ast
standard.

On Nov 13, 2017 13:38, "Greg Ewing"  wrote:

> Guido van Rossum wrote:
>
>> But Python's syntax changes in nearly every release.
>>
>
> The changes are almost always additions, so there's no
> reason why the AST can't remain backwards compatible.
>
> the AST level ... elides many details (such as whitespace and parentheses).
>>
>
> That's okay, because the AST is only expected to
> represent the semantics of Python code, not its
> exact lexical representation in the source. It's
> the same with Lisp -- comments and whitespace have
> been stripped out by the time you get to Lisp
> data.
>
> Lisp had almost no syntax so I presume the mapping to data structures was
>> nearly trivial compared to Python.
>>
>
> Yes, the Python AST is more complicated, but we
> already have that much complexity in the AST being
> used by the compiler.
>
> If I understand correctly, we also have a process
> for converting that internal structure to and from
> an equally complicated set of Python objects, that
> isn't needed by the compiler and exists purely for
> the convenience of Python code.
>
> I can't see much complexity being added if we were
> to decide to standardise the Python representation.
>
> --
> Greg
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/njs%
> 40pobox.com
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 565: Show DeprecationWarning in main

2017-11-13 Thread Nathaniel Smith

On Mon, Nov 13, 2017 at 6:09 AM, Serhiy Storchaka  wrote:
> 13.11.17 14:29, Antoine Pitrou пише:
>>
>> On Mon, 13 Nov 2017 22:37:46 +1100
>> Chris Angelico  wrote:
>>>
>>> On Mon, Nov 13, 2017 at 9:46 PM, Antoine Pitrou 
>>> wrote:
>>>>
>>>> On Sun, 12 Nov 2017 19:48:28 -0800
>>>> Nathaniel Smith  wrote:
>>>>>
>>>>> On Sun, Nov 12, 2017 at 1:24 AM, Nick Coghlan 
>>>>> wrote:
>>>>>>
>>>>>> This change will lead to DeprecationWarning being displayed by default
>>>>>> for:
>>>>>>
>>>>>> * code executed directly at the interactive prompt
>>>>>> * code executed directly as part of a single-file script
>>>>>
>>>>>
>>>>> Technically it's orthogonal, but if you're trying to get better
>>>>> warnings in the REPL, then you might also want to look at:
>>>>>
>>>>> https://bugs.python.org/issue1539925
>>>>> https://github.com/ipython/ipython/issues/6611
>>>>
>>>>
>>>> Depends what you call "better".  Personally, I don't want to see
>>>> warnings each and every time I use a deprecated or questionable
>>>> construct or API from the REPL.
>>>
>>>
>>> Isn't that the entire *point* of warnings? When you're working at the
>>> REPL, you're the one in control of which APIs you use, so you should
>>> be the one to know about deprecations.
>>
>>
>> If I see a warning once every REPL session, I know about the deprecation
>> already, thank you.  I don't need to be taken by the hand like a little
>> child.  Besides, the code I write in the REPL is not meant for durable
>> use.
>
>
> Hmm, now I see that the simple Nathaniel's solution is not completely
> correct. If the warning action is 'module', it should be emitted only once
> if used directly in the REPL, because '__main__' is the same module.

True. The fundamental problem is that generally, Python uses
(filename, lineno) pairs to identify lines of code. But (a) the
warning module assumes that for each namespace dict, there is a unique
mapping between line numbers and lines of code, so it ignores filename
and just keys off lineno, and (b) the REPL re-uses the same (file,
lineno) for different lines of code anyway.

So I guess the fully correct solution would be to use a unique
"filename" when compiling each block of code -- e.g. the REPL could do
the equivalent of compile(, "REPL[1]", ...) for the first line,
compile(, "REPL[2]", ...) for the second line, etc. -- and then
also teach the warnings module's duplicate detection logic to key off
of (file, lineno) pairs instead of just lineno.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] module customization

2017-11-15 Thread Nathaniel Smith

On Wed, Nov 15, 2017 at 4:27 PM, Ethan Furman  wrote:
> The second way is fairly similar, but instead of replacing the entire
> sys.modules entry, its class is updated to be the class just created --
> something like sys.modules['mymod'].__class__ = MyNewClass .
>
> My request:  Can someone write a better example of the second method?  And
> include __getattr__ ?

Here's a fairly straightforward example:

https://github.com/python-trio/trio/blob/master/trio/_deprecate.py#L114-L140

(Intentionally doesn't include __dir__ because I didn't want
deprecated attributes to show up in tab completion. For other use
cases like lazy imports, you would implement __dir__ too.)

Example usage:

https://github.com/python-trio/trio/blob/master/trio/__init__.py#L66-L98

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] module customization

2017-11-15 Thread Nathaniel Smith

On Wed, Nov 15, 2017 at 5:49 PM, Guido van Rossum  wrote:
>> If not, why not, and if so, shouldn't PEP 562's __getattr__ also take a
>> 'self'?
>
> Not really, since there's only one module (the one containing the
> __getattr__ function). Plus we already have a 1-argument module-level
> __getattr__ in mypy. See PEP 484.

I guess the benefit of taking 'self' would be that it would make it
possible (though still a bit odd-looking) to have reusable __getattr__
implementations, like:

# mymodule.py
from auto_importer import __getattr__, __dir__

auto_import_modules = {"foo", "bar"}

# auto_importer.py
def __getattr__(self, name):
if name in self.auto_import_modules:
...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] module customization

2017-11-15 Thread Nathaniel Smith

On Wed, Nov 15, 2017 at 10:14 PM, Nathaniel Smith  wrote:
> On Wed, Nov 15, 2017 at 4:27 PM, Ethan Furman  wrote:
>> The second way is fairly similar, but instead of replacing the entire
>> sys.modules entry, its class is updated to be the class just created --
>> something like sys.modules['mymod'].__class__ = MyNewClass .
>>
>> My request:  Can someone write a better example of the second method?  And
>> include __getattr__ ?

Doh, I forgot to permalinkify those. Better links for anyone reading
this in the future:

> Here's a fairly straightforward example:
>
> https://github.com/python-trio/trio/blob/master/trio/_deprecate.py#L114-L140

https://github.com/python-trio/trio/blob/3edfafeedef4071646a9015e28be01f83dc02f94/trio/_deprecate.py#L114-L140

> (Intentionally doesn't include __dir__ because I didn't want
> deprecated attributes to show up in tab completion. For other use
> cases like lazy imports, you would implement __dir__ too.)
>
> Example usage:
>
> https://github.com/python-trio/trio/blob/master/trio/__init__.py#L66-L98

https://github.com/python-trio/trio/blob/3edfafeedef4071646a9015e28be01f83dc02f94/trio/__init__.py#L66-L98

> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 565: Show DeprecationWarning in main

2017-11-19 Thread Nathaniel Smith

On Sun, Nov 19, 2017 at 2:26 AM, Serhiy Storchaka  wrote:
> It seems to me that most of issues with FutureWarning on GitHub [1] are
> related to NumPy and pandas which use FutureWarning for its original nominal
> purpose, for warning about using programming interfaces that will change the
> behavior in future. This doesn't have any relation to end users unless the
> end user is an author of the written code.
>
> [1] https://github.com/search?q=FutureWarning&type=Issues

Eh, numpy does use FutureWarning for changes where the same code will
transition from doing one thing to doing something else without
passing through a state where it raises an error. But that decision
was based on FutureWarning being shown to users by default, not
because it matches the nominal purpose :-). IIRC I proposed this
policy for NumPy in the first place, and I still don't even know if it
matches the original intent because the docs are so vague. "Will
change behavior in the future" describes every case where you might
consider using FutureWarning *or* DeprecationWarning, right?

We have been using DeprecationWarning for changes where code will
transition from working -> raising an error, and that *is* based on
the Official Recommendation to hide those by default whenever
possible. We've been doing this for a few years now, and I'd say our
experience so far has been... poor. I'm trying to figure out how to
say this politely. Basically it doesn't work at all. What happens in
practice is that we issue a DeprecationWarning for a year, mostly
no-one notices, then we make the change in a 1.x.0 release, everyone's
code breaks, we roll it back in 1.x.1, and then possibly repeat
several times in 1.(x+1).0 and 1.(x+2).0 until enough people have
updated their code that the screams die down. I'm pretty sure we'll be
changing our policy at some point, possibly to always use
FutureWarning for everything.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tricky way of of creating a generator via a comprehension expression

2017-11-24 Thread Nathaniel Smith

On Fri, Nov 24, 2017 at 4:22 PM, Guido van Rossum  wrote:
> The more I hear about this topic, the more I think that `await`, `yield` and
> `yield from` should all be banned from occurring in all comprehensions and
> generator expressions. That's not much different from disallowing `return`
> or `break`.

I would say that banning `yield` and `yield from` is like banning
`return` and `break`, but banning `await` is like banning function
calls. There's no reason for most users to even know that `await` is
related to generators, so a rule disallowing it inside comprehensions
is just confusing. AFAICT 99% of the confusion around async/await is
because people think of them as being related to generators, when from
the user point of view it's not true at all and `await` is just a
funny function-call syntax.

Also, at the language level, there's a key difference between these
cases. A comprehension has implicit `yield`s in it, and then mixing in
explicit `yield`s as well obviously leads to confusion. But when you
use an `await` in a comprehension, that turns it into an async
generator expression (thanks to PEP 530), and in an async generator,
`yield` and `await` use two separate, unrelated channels. So there's
no confusion or problem with having `await` inside a comprehension.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tricky way of of creating a generator via a comprehension expression

2017-11-24 Thread Nathaniel Smith

On Fri, Nov 24, 2017 at 9:04 PM, Nick Coghlan  wrote:
> def example():
> comp1 = yield from [(yield x) for x in ('1st', '2nd')]
> comp2 = yield from [(yield x) for x in ('3rd', '4th')]
> return comp1, comp2

Isn't this a really confusing way of writing

def example():
return [(yield '1st'), (yield '2nd')], [(yield '3rd'), (yield '4th')]

?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tricky way of of creating a generator via a comprehension expression

2017-11-24 Thread Nathaniel Smith

On Fri, Nov 24, 2017 at 9:39 PM, Nick Coghlan  wrote:
> On 25 November 2017 at 15:27, Nathaniel Smith  wrote:
>> On Fri, Nov 24, 2017 at 9:04 PM, Nick Coghlan  wrote:
>>> def example():
>>> comp1 = yield from [(yield x) for x in ('1st', '2nd')]
>>> comp2 = yield from [(yield x) for x in ('3rd', '4th')]
>>> return comp1, comp2
>>
>> Isn't this a really confusing way of writing
>>
>> def example():
>> return [(yield '1st'), (yield '2nd')], [(yield '3rd'), (yield '4th')]
>
> A real use case

Do you have a real use case? This seems incredibly niche...

> wouldn't be iterating over hardcoded tuples in the
> comprehensions, it would be something more like:
>
> def example(iterable1, iterable2):
> comp1 = yield from [(yield x) for x in iterable1]
> comp2 = yield from [(yield x) for x in iterable2]
> return comp1, comp2

I submit that this would still be easier to understand if written out like:

def map_iterable_to_yield_values(iterable):
"Yield the values in iterable, then return a list of the values sent back."
result = []
for obj in iterable:
result.append(yield obj)
return result

def example(iterable1, iterable2):
values1 = yield from map_iterable_to_yield_values(iterable1)
values2 = yield from map_iterable_to_yield_values(iterable2)
return values1, values2

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tricky way of of creating a generator via a comprehension expression

2017-11-26 Thread Nathaniel Smith

On Sat, Nov 25, 2017 at 3:37 PM, Guido van Rossum  wrote:
> On Sat, Nov 25, 2017 at 1:05 PM, David Mertz  wrote:
>>
>> FWIW, on a side point. I use 'yield' and 'yield from' ALL THE TIME in real
>> code. Probably 80% of those would be fine with yield statements, but a
>> significant fraction use `gen.send()`.
>>
>> On the other hand, I have yet once to use 'await', or 'async' outside of
>> pedagogical contexts. There are a whole lot of generators, including ones
>> utilizing state injection, that are useful without the scaffolding of an
>> event loop, in synchronous code.
>
>
> Maybe you didn't realize async/await don't need an event loop? Driving an
> async/await-based coroutine is just as simple as driving a yield-from-based
> one (`await` does exactly the same thing as `yield from`).

Technically anything you can write with yield/yield from could also be
written using async/await and vice-versa, but I think it's actually
nice to have both in the language.

The distinction I'd make is that yield/yield from is what you should
use for ad hoc coroutines where the person writing the code that has
'yield from's in it is expected to understand the details of the
coroutine runner, while async/await is what you should use when the
coroutine running is handled by a library like asyncio, and the person
writing code with 'await's in it is expected to treat coroutine stuff
as an opaque implementation detail. (NB I'm using "coroutine" in the
CS sense here, where generators and async functions are both
"coroutines".)

I think of this as being sort of half-way between a style guideline
and a technical guideline. It's like the guideline that lists should
be homogenously-typed and variable length, while tuples are
heterogenously-typed and fixed length: there's nothing in the language
that outright *enforces* this, but it's a helpful convention *and*
things tend to work better if you go along with it.

Here are some technical issues you'll run into if you try to use
async/await for ad hoc coroutines:

- If you don't iterate an async function, you get a "coroutine never
awaited" warning. This may or may not be what you want.

- async/await has associated thread-global state like
sys.set_coroutine_wrapper and sys.set_asyncgen_hooks. Generally async
libraries assume that they own these, and arbitrarily weird things may
happen if you have multiple async/await coroutine runners in same
thread with no coordination between them.

- In async/await, it's not obvious how to write leaf functions:
'await' is equivalent to 'yield from', but there's no equivalent to
'yield'. You have to jump through some hoops by writing a class with a
custom __await__ method or using @types.coroutine. Of course it's
doable, and it's no big deal if you're writing a proper async library,
but it's awkward for quick ad hoc usage.

For a concrete example of 'ad hoc coroutines' where I think 'yield
from' is appropriate, here's wsproto's old 'yield from'-based
incremental websocket protocol parser:

https://github.com/python-hyper/wsproto/blob/4b7db502cc0568ab2354798552148dadd563a4e3/wsproto/frame_protocol.py#L142

The flow here is: received_frames is the public API: it gives you an
iterator over all completed frames. When it stops you're expected to
add more data to the buffer and then call it again. Internally,
received_frames acts as a coroutine runner for parse_more_gen, which
is the main parser that calls various helper methods to parse
different parts of the websocket frame. These calls eventually bottom
out in _consume_exactly or _consume_at_most, which use 'yield' to
"block" until enough data is available in the internal buffer.
Basically this is the classic trick of using coroutines to write an
incremental state machine parser as ordinary-looking code where the
state is encoded in local variables on the stack.

Using coroutines here isn't just a cute trick; I'm pretty confident
that there is absolutely no other way to write a readable incremental
websocket parser in Python. This is the 3rd rewrite of wsproto's
parser, and I think I've read the code for all the other Python
libraries that do this too. The websocket framing format is branchy
enough that trying to write out the state machine explicitly will
absolutely tie you in knots. (Of course we then rewrote wsproto's
parser a 4th time for py2 compatibility; the current version's not
*terrible* but the 'yield from' version was simpler and more
maintainable.)

For wsproto's use case, I think using 'await' would be noticeably
worse than 'yield from'. It'd make the code more opaque to readers
(people know generators but no-one shows up already knowing what
@types.coroutine does), the "coroutine never awaited" warnings would
be obnoxious (it's totally fine to instantiate a parser and then throw
it away without using it!), and the global state issues would make us
very nervous (wsproto is absolutely designed to be used alongside a
library like asyncio or trio). But that's fine; 'yield from' exists
a

Re: [Python-Dev] Using async/await in place of yield expression

2017-11-27 Thread Nathaniel Smith

On Sun, Nov 26, 2017 at 9:33 PM, Caleb Hattingh
 wrote:
> The PEP only says that __await__ must return an iterator, but it turns out
> that it's also required that that iterator
> should not return any intermediate values.

I think you're confused :-). When the iterator yields an intermediate
value, it does two things:

(1) it suspends the current call stack and returns control to the
coroutine runner (i.e. the event loop)
(2) it sends some arbitrary value back to the coroutine runner

The whole point of `await` is that it can do (1) -- this is what lets
you switch between executing different tasks, so they can pretend to
execute in parallel. However, you do need to make sure that your
__await__ and your coroutine runner are on the same page with respect
to (2) -- if you send a value that the coroutine runner isn't
expecting, it'll get confused. Generally async libraries control both
the coroutine runner and the __await__ method, so they get to invent
whatever arbitrary convention they want.

In asyncio, the convention is that the values you send back must be
Future objects, and the coroutine runner interprets this as a request
to wait for the Future to be resolved, and then resume the current
call stack. In curio, the convention is that you send back a special
tuple describing some operation you want the event loop to perform
[1], and then it resumes your call stack once that operation has
finished. And Trio barely uses this channel at all. (It does transfer
a bit of information that way for convenience/speed, but the main work
of setting up the task to be resumed at the appropriate time happens
through other mechanisms.)

What you observed is that the asyncio coroutine runner gets cranky if
you send it an integer when it was expecting a Future.

Since most libraries assume that they control both __await__ and the
coroutine runner, they don't tend to give great error messages here
(though trio does [2] ;-)). I think this is also why the asyncio docs
don't talk about this. I guess in asyncio's case it is technically a
semi-public API because you need to know how it works if you're the
author of a library like tornado or twisted that wants to integrate
with asyncio. But most people aren't the authors of tornado or
twisted, and the ones who are already know how this works, so the lack
of docs isn't a huge deal in practice...

-n

[1] 
https://github.com/dabeaz/curio/blob/bd0e2cb7741278d1d9288780127dc0807b1aa5b1/curio/traps.py#L48-L156
[2] 
https://github.com/python-trio/trio/blob/2b8e297e544088b98ff758d37c7ad84f74c3f2f5/trio/_core/_run.py#L1521-L1530

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 565: Show DeprecationWarning in main

2017-11-29 Thread Nathaniel Smith

On Nov 28, 2017 3:55 PM, "Guido van Rossum"  wrote:

On Sun, Nov 19, 2017 at 5:40 AM, Nathaniel Smith  wrote:

> Eh, numpy does use FutureWarning for changes where the same code will
> transition from doing one thing to doing something else without
> passing through a state where it raises an error. But that decision
> was based on FutureWarning being shown to users by default, not
> because it matches the nominal purpose :-). IIRC I proposed this
> policy for NumPy in the first place, and I still don't even know if it
> matches the original intent because the docs are so vague. "Will
> change behavior in the future" describes every case where you might
> consider using FutureWarning *or* DeprecationWarning, right?
>
> We have been using DeprecationWarning for changes where code will
> transition from working -> raising an error, and that *is* based on
> the Official Recommendation to hide those by default whenever
> possible. We've been doing this for a few years now, and I'd say our
> experience so far has been... poor. I'm trying to figure out how to
> say this politely. Basically it doesn't work at all. What happens in
> practice is that we issue a DeprecationWarning for a year, mostly
> no-one notices, then we make the change in a 1.x.0 release, everyone's
> code breaks, we roll it back in 1.x.1, and then possibly repeat
> several times in 1.(x+1).0 and 1.(x+2).0 until enough people have
> updated their code that the screams die down. I'm pretty sure we'll be
> changing our policy at some point, possibly to always use
> FutureWarning for everything.

Can one of you check that the latest version of PEP 565 gets this right?

If you're asking about the the proposed new language about FutureWarnings,
it seems fine to me. If you're asking about the PEP as a whole, it seems
fine but I don't think it will make much difference in our case. IPython
has been showing deprecation warnings in __main__ for a few years now, and
it's nice enough. Getting warnings for scripts seems nice too. But we
aren't rolling back changes because they broke someone's one off script –
I'm sure it happens but we don't tend to hear about it. We're responding to
things like major downstream dependencies that nonetheless totally missed
all the warnings.

The part that might help there is evangelising popular test runners like
pytest to change their defaults. To me that's the most interesting change
to come out of this. But it's hard to predict in advance how effective it
will be.

tl;dr: I don't think PEP 565 solves all my problems, but I don't have any
objections to what it does to.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issues with PEP 526 Variable Notation at the class level

2017-12-08 Thread Nathaniel Smith

On Dec 7, 2017 12:49, "Eric V. Smith"  wrote:

The reason I didn't include it (as @dataclass(slots=True)) is because it
has to return a new class, and the rest of the dataclass features just
modifies the given class in place. I wanted to maintain that conceptual
simplicity. But this might be a reason to abandon that. For what it's
worth, attrs does have an @attr.s(slots=True) that returns a new class with
__slots__ set.


They actually switched to always returning a new class, regardless of
whether slots is set:

https://github.com/python-attrs/attrs/pull/260

You'd have to ask Hynek to get the full rationale, but I believe it was
both for consistency with slot classes, and for consistency with regular
class definition. For example, type.__new__ actually does different things
depending on whether it sees an __eq__ method, so adding a method after the
fact led to weird bugs with hashing. That class of bug goes away if you
always set up the autogenerated methods and then call type.__new__.

 -n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 567 -- Context Variables

2017-12-13 Thread Nathaniel Smith

On Tue, Dec 12, 2017 at 10:39 PM, Dima Tisnek  wrote:
> My 2c:
> TL;DR PEP specifies implementation in some detail, but doesn't show
> how proposed change can or should be used.
>
>
>
> get()/set(value)/delete() methods: Python provides syntax sugar for
> these, let's use it.
> (dict: d["k"]/d["k] = value/del d["k"]; attrs: obj.k/obj.k = value/del
> obj.k; inheriting threading.Local)

This was already discussed to death in the PEP 550 threads... what
most users want is a single value, and routing get/set through a
ContextVar object allows for important optimizations and a simpler
implementation. Also, remember that 99% of users will never use these
objects directly; it's a low-level API mostly useful to framework
implementers.

> This PEP and 550 describe why TLS is inadequate, but don't seem to
> specify how proposed context behaves in async world. I'd be most
> interested in how it appears to work to the user of the new library.
>
> Consider a case of asynchronous cache:
>
> async def actual_lookup(name):
> ...
>
> def cached_lookup(name, cache={}):
> if name not in cache:
> cache["name"] = shield(ensure_future(actual_lookup(name))
> return cache["name"]
>
> Unrelated (or related) asynchronous processes end up waiting on the same 
> future:
>
> async def called_with_user_context():
> ...
> await cached_lookup(...)
> ...
>
> Which context is propagated to actual_lookup()?
> The PEP doesn't seem to state that clearly.
> It appears to be first caller's context.

Yes.

> Is it a copy or a reference?

It's a copy, as returned by get_context().

> If first caller is cancelled, the context remains alive.
>
>
>
> token is fragile, I believe PEP should propose a working context
> manager instead.
> Btw., isn't a token really a reference to
> state-of-context-before-it's-cloned-and-modified?

No, a Token only represents the value of one ContextVar, not the whole
Context. This could maybe be clearer in the PEP, but it has to be this
way or you'd get weird behavior from code like:

with decimal.localcontext(...):  # sets and then restores
numpy.seterr(...) # sets without any plan to restore
# after the 'with' block, the decimal ContextVar gets restored
# but this shouldn't affect the numpy.seterr ContextVar

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-12-14 Thread Nathaniel Smith

On Dec 14, 2017 21:30, "Raymond Hettinger" 
wrote:

> On Dec 14, 2017, at 6:03 PM, INADA Naoki  wrote:
>
> If "dict keeps insertion order" is not language spec and we
> continue to recommend people to use OrderedDict to keep
> order, I want to optimize OrderedDict for creation/iteration
> and memory usage.  (See https://bugs.python.org/issue31265#msg301942 )

I support having regular dicts maintain insertion order but am opposed to
Inada changing the implementation of collections.OrderedDict   We can have
the first without having the second.

It seems like the two quoted paragraphs are in vociferous agreement.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-12-15 Thread Nathaniel Smith

On Dec 15, 2017 10:50, "Tim Peters"  wrote:

[Eric Snow ]
> Does that include preserving order after deletion?

Given that we're blessing current behavior:

- At any moment, iteration order is from oldest to newest.  So, "yes"
to your question.

- While iteration starts with the oldest, .popitem() returns the
youngest.  This is analogous to how lists work, viewing a dict
similarly ordered "left to right" (iteration starts at the left,
.pop() at the right, for lists and dicts).


Fortunately, this also matches OrderedDict.popitem().

It'd be nice if we could also support dict.popitem(last=False) to get the
other behavior, again matching OrderedDict.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Usefulness of binary compatibility accross Python versions?

2017-12-17 Thread Nathaniel Smith

On Dec 16, 2017 11:44 AM, "Guido van Rossum"  wrote:

On Sat, Dec 16, 2017 at 11:14 AM, Antoine Pitrou 
wrote:

> On Sat, 16 Dec 2017 19:37:54 +0100
> Antoine Pitrou  wrote:
> >
> > Currently, you can pass a `module_api_version` to PyModule_Create2(),
> > but that function is for specialists only :-)
> >
> > ("""Most uses of this function should be using PyModule_Create()
> > instead; only use this if you are sure you need it.""")
>
> Ah, it turns out I misunderstood that piece of documentation and also
> what PEP 3121 really did w.r.t the module API check.
>
> PyModule_Create() is actually a *macro* calling PyModule_Create2() with
> the version number is was compiled against!
>
> #ifdef Py_LIMITED_API
> #define PyModule_Create(module) \
> PyModule_Create2(module, PYTHON_ABI_VERSION)
> #else
> #define PyModule_Create(module) \
> PyModule_Create2(module, PYTHON_API_VERSION)
> #endif
>
> And there's already a check for that version number in moduleobject.c:
> https://github.com/python/cpython/blob/master/Objects/moduleobject.c#L114
>
> That check is always invoked when calling PyModule_Create() and
> PyModule_Create2().  Currently it merely invokes a warning, but we can
> easily turn that into an error.
>
> (with apologies to Martin von Löwis for not fully understanding what he
> did at the time :-))
>

If it's only a warning, I worry that if we stop checking the flag bits it
can cause wild pointer following. This sounds like it would be a potential
security issue (load a module, ignore the warning, try to use a certain API
on a class it defines, boom). Also, could there still be 3rd party modules
out there that haven't been recompiled in a really long time and use some
older backwards compatible module initialization API? (I guess we could
stop supporting that and let them fail hard.)

I think there's a pretty simple way to avoid this kind of problem.

Since PEP 3149 (Python 3.2), the import system has (IIUC) checked for:

foo.cpython-XYm.so
foo.abi3.so
foo.so

If we drop foo.so from this list, then we're pretty much guaranteed not to
load anything into a python that it wasn't intended for.

How disruptive would this be? AFAICT there hasn't been any standard way to
build python extensions named like 'foo.so' since 3.2 was released, so
we're talking about modules from 3.1 and earlier (or else people who are
manually hacking around the compatibility checking system, who can
presumably take care of themselves). We've at a minimum been issuing
warnings about these modules for 5 versions now (based on Antoine's
analysis above), and I'd be really surprised if a module built for 3.1
works on 3.7 anyway. So this change seems pretty reasonable to me.

-n
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-12-18 Thread Nathaniel Smith

On Mon, Dec 18, 2017 at 7:02 PM, Barry Warsaw  wrote:
> On Dec 18, 2017, at 21:11, Chris Barker  wrote:
>
>> Will changing pprint be considered a breaking change?
>
> Yes, definitely.

Wait, what? Why would changing pprint (so that it accurately reflects
dict's new underlying semantics!) be a breaking change? Are you
suggesting it shouldn't be changed in 3.7?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-12-18 Thread Nathaniel Smith

On Mon, Dec 18, 2017 at 7:58 PM, Steven D'Aprano  wrote:
> On Mon, Dec 18, 2017 at 07:37:03PM -0800, Nathaniel Smith wrote:
>> On Mon, Dec 18, 2017 at 7:02 PM, Barry Warsaw  wrote:
>> > On Dec 18, 2017, at 21:11, Chris Barker  wrote:
>> >
>> >> Will changing pprint be considered a breaking change?
>> >
>> > Yes, definitely.
>>
>> Wait, what? Why would changing pprint (so that it accurately reflects
>> dict's new underlying semantics!) be a breaking change?
>
> I have a script which today prints data like so:
>
> {'Aaron': 62,
>  'Anne': 51,
>  'Bob': 23,
>  'George': 30,
>  'Karen': 45,
>  'Sue': 17,
>  'Sylvester': 34}
>
> Tomorrow, it will suddenly start printing:
>
> {'Bob': 23,
>  'Karen': 45,
>  'Sue': 17,
>  'George': 30,
>  'Aaron': 62,
>  'Anne': 51,
>  'Sylvester': 34}
>
>
> and my users will yell at me that my script is broken because the data
> is now in random order.

To make sure I understand, do you actually have a script like this, or
is this hypothetical?

> Now, maybe that's my own damn fault for using
> pprint instead of writing my own pretty printer... but surely the point
> of pprint is so I don't have to write my own?
>
> Besides, the docs say very prominently:
>
> "Dictionaries are sorted by key before the display is computed."
>
> https://docs.python.org/3/library/pprint.html
>
> so I think I can be excused having relied on that feature.

No need to get aggro -- I asked a question, it wasn't a personal attack.

At a high-level, pprint's job is to "pretty-print arbitray Python data
structures in a form which can be used as input to the interpreter"
(quoting the first sentence of its documentation), i.e., like repr()
it's fundamentally intended as a debugging tool that's supposed to
match how Python works, not any particular externally imposed output
format. Now, how Python works has changed. Previously dict order was
arbitrary, so picking the arbitrary order that happened to be sorted
was a nice convenience. Now, dict order isn't arbitrary, and sorting
dicts both obscures the actual structure of the Python objects, and
also breaks round-tripping through pprint. Given that pprint's
overarching documented contract of "represent Python objects" now
conflicts with the more-specific documented contract of "sort dict
keys", something has to give.

My feeling is that we should preserve the overarching contract, not
the details of how dicts were handled. Here's another example of a
teacher struggling with this:
https://mastodon.social/@aparrish/13011522

But I would be in favor of adding a kwarg to let people opt-in to the
old behavior like:

from pprint import PrettyPrinter
pprint = PrettyPrinter(sortdict=True).pprint

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-12-19 Thread Nathaniel Smith

On Mon, Dec 18, 2017 at 11:38 PM, Steve Dower  wrote:
> On 18Dec2017 2309, Steven D'Aprano wrote:
>> [A LOT OF THINGS I AGREE WITH]
> I agree completely with Steven's reasoning here, and it bothers me that
> what is an irrelevant change to many users (dict becoming ordered) seems
> to imply that all users of dict have to be updated.

Can we all take a deep breath and lay off the hyperbole? The only
point under discussion in this subthread is whether pprint -- our
module for producing nicely-formatted-reprs -- should continue to sort
keys, or should continue to provide an accurate repr. There are
reasonable arguments for both positions, but no-one's suggesting
anything in the same solar system as "all users of dict have to be
updated".

Am I missing some underlying nerve that this is hitting for some reason?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-12-19 Thread Nathaniel Smith

On Tue, Dec 19, 2017 at 4:56 PM, Steve Dower  wrote:
> On 19Dec2017 1004, Chris Barker wrote:
>>
>> Nathaniel Smith has pointed out that eval(pprint(a_dict)) is supposed to
>> return the same dict -- so documented behavior may already be broken.
>
>
> Two relevant quotes from the pprint module docs:
>
>>>> The pprint module provides a capability to “pretty-print” arbitrary
>>>> Python data structures in a form which can be used as input to the
>>>> interpreter
>
>>>> Dictionaries are sorted by key before the display is computed.
>
> It says nothing about the resulting dict being the same as the original one,
> just that it can be used as input. So these are both still true (until
> someone deliberately breaks the latter).

This is a pretty fine hair to be splitting... I'm sure you wouldn't
argue that it would be valid to display the dict {"a": 1} as
'["hello"]', just because '["hello"]' is a valid input to the
interpreter (that happens to produce a different object than the
original one) :-). I think we can assume that pprint's output is
supposed to let you reconstruct the original data structures, at least
in simple cases, even if that isn't explicitly stated.

> In any case, there are so many ways
> to spoil the first point for yourself that it's hardly worth treating as an
> important constraint.

I guess the underlying issue here is partly the question of what the
pprint module is for. In my understanding, it's primarily a tool for
debugging/introspecting Python programs, and the reason it talks about
"valid input to the interpreter" isn't because we want anyone to
actually feed the data back into the interpreter, but to emphasize
that it provides an accurate what-you-see-is-what's-really-there view
into how the interpreter understands a given object. It also
emphasizes that this is not intended for display to end users; making
the output format be "Python code" suggests that the main intended
audience is people who know how to read, well, Python code, and
therefore can be expected to care about Python's semantics.

>> (though I assume order is still ignored when comparing dicts, so:
>> eval(pprint(a_dict)) == a_dict will still hold.
>
>
> Order had better be ignored when comparing dicts, or plenty of code will
> break. For example:
>
>>>> {'a': 1, 'b': 2} == {'b': 2, 'a': 1}
> True

Yes, this is never going to change -- I expect that in the long run,
the only semantic difference between dict and OrderedDict will be in
their __eq__ methods.

> Saying that "iter(dict)" will produce keys in the same order as they were
> inserted is not the same as saying that "dict" is an ordered mapping. As far
> as I understand, we've only said the first part.
>
> (And the "nerve" here is that I disagreed with even the first part, but
> didn't fight it too strongly because I never relied on the iteration order
> of dict. However, I *do* rely on nobody else relying on the iteration order
> of dict either, and so proposals to change existing semantics that were
> previously independent of insertion order to make them rely on insertion
> order will affect me. So now I'm pushing back.)

I mean, I don't want to be a jerk about this, and we still need to
examine things on a case-by-case basis but... Guido has pronounced
that Python dict preserves order. If your code "rel[ies] on nobody
else relying on the iteration order", then starting in 3.7 your code
is no longer Python.

Obviously I like that change more than you, but to some extent it's
just something we have to live with, and even if I disagreed with the
new semantics I'd still rather the standard library handle them
consistently rather than being half-one-thing-and-half-another.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 567 v2

2017-12-28 Thread Nathaniel Smith

On Thu, Dec 28, 2017 at 1:51 AM, Victor Stinner
 wrote:
> var = ContextVar('var', default=42)
>
> and:
>
> var = ContextVar('var')
> var.set (42)
>
> behaves the same, no?

No, they're different. The second sets the value in the current
context. The first sets the value in all contexts that currently
exist, and all empty contexts created in the future.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Concerns about method overriding and subclassing with dataclasses

2017-12-29 Thread Nathaniel Smith

On Fri, Dec 29, 2017 at 12:30 PM, Ethan Furman  wrote:
> Good point.  So auto-generate a new __repr__ if:
>
> - one is not provided, and
> - existing __repr__ is either:
>   - object.__repr__, or
>   - a previous dataclass __repr__
>
> And if the auto default doesn't work for one's use-case, use the keyword
> parameter to specify what you want.

What does attrs do here?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-30 Thread Nathaniel Smith

On Sat, Dec 30, 2017 at 7:26 AM, Stephen J. Turnbull
 wrote:
> Christian Heimes writes:
>  > Questions:
>  > - Is everybody OK with breaking backwards compatibility? The risk is
>  > small. ASCII-only domains are not affected
>
> That's not quite true, as your German example shows.  In some Oriental
> renderings it is impossible to distinguish halfwidth digits from
> full-width ones as the same glyphs are used.  (This occasionally
> happens with other ASCII characters, but users are more fussy about
> digits lining up.)  That is, while technically ASCII-only domain names
> are not affected, users of ASCII-only domain names are potentially
> vulnerable to confusable names when IDNA is introduced.  (Hopefully
> the Asian registrars are as woke as the German ones!  But you could
> still register a .com containing full-width digits or letters.)

This particular example isn't an issue: in IDNA encoding, full-width
and half-width digits are normalized together, so number1.com and
number１.com actually refer to the same domain name. This is true in
both the 2003 and 2008 versions:

# IDNA 2003
In [7]: "number\uff11.com".encode("idna")
Out[7]: b'number1.com'

# IDNA 2008 (using the 'idna' package from pypi)
In [8]: idna.encode("number\uff11.com", uts46=True)
Out[8]: b'number1.com'

That said, IDNA does still allow for a bunch of spoofing opportunities
that aren't possible with pure ASCII, and this requires some care:
https://unicode.org/faq/idn.html#16

This is mostly a UI issue, though; there's not much that the socket or
ssl modules can do to help here.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-30 Thread Nathaniel Smith

On Sat, Dec 30, 2017 at 2:28 AM, Antoine Pitrou  wrote:
> On Fri, 29 Dec 2017 21:54:46 +0100
> Christian Heimes  wrote:
>>
>> On the other hand ssl module is currently completely broken. It converts
>> hostnames from bytes to text with 'idna' codec in some places, but not
>> in all. The SSLSocket.server_hostname attribute and callback function
>> SSLContext.set_servername_callback() are decoded as U-label.
>> Certificate's common name and subject alternative name fields are not
>> decoded and therefore A-labels. The *must* stay A-labels because
>> hostname verification is only defined in terms of A-labels. We even had
>> a security issue once, because partial wildcard like 'xn*.example.org'
>> must not match IDN hosts like 'xn--bcher-kva.example.org'.
>>
>> In issue [2] and PR [3], we all agreed that the only sensible fix is to
>> make 'SSLContext.server_hostname' an ASCII text A-label.
>
> What are the changes in API terms?  If I'm calling wrap_socket(), can I
> pass `server_hostname='straße'` and it will IDNA-encode it?  Or do I
> have to encode it myself?  If the latter, it seems like we are putting
> the burden of protocol compliance on users.

Part of what makes this confusing is that there are actually three
intertwined issues here. (Also, anything that deals with Unicode *or*
SSL/TLS is automatically confusing, and this is about both!)

Issue 1: Python's built-in IDNA implementation is wrong (implements
IDNA 2003, not IDNA 2008).
Issue 2: The ssl module insists on using Python's built-in IDNA
implementation whether you want it to or not.
Issue 3: Also, the ssl module has a separate bug that means
client-side cert validation has never worked for any IDNA domain.

Issue 1 is potentially a security issue, because it means that in a
small number of cases, Python will misinterpret a domain name. IDNA
2003 and IDNA 2008 are very similar, but there are 4 characters that
are interpreted differently, with ß being one of them. Fixing this
though is a big job, and doesn't exactly have anything to do with the
ssl module -- for example, socket.getaddrinfo("straße.de", 80) and
sock.connect("straße.de", 80) also do the wrong thing. Christian's not
proposing to fix this here. It's issues 2 and 3 that he's proposing to
fix.

Issue 2 is a problem because it makes it impossible to work around
issue 1, even for users who know what they're doing. In the socket
module, you can avoid Python's automagical IDNA handling by doing it
manually, and then calling socket.getaddrinfo("strasse.de", 80) or
socket.getaddrinfo("xn--strae-oqa.de", 80), whichever you prefer. In
the ssl module, this doesn't work. There are two places where ssl uses
hostnames. In client mode, the user specifies the server_hostname that
they want to see a certificate for, and then the module runs this
through Python's IDNA machinery *even if* it's already properly
encoded in ascii. And in server mode, when the user has specified an
SNI callback so they can find out which certificate an incoming client
connection is looking for, the module runs the incoming name through
Python's IDNA machinery before handing it to user code. In both cases,
the right thing to do would be to just pass through the ascii A-label
versions, so savvy users can do whatever they want with them. (This
also matches the general design principle around IDNA, which assumes
that the pretty unicode U-labels are used only for UI purposes, and
everything internal uses A-labels.)

Issue 3 is just a silly bug that needs to be fixed, but it's tangled
up here because the fix is the same as for Issue 2: the reason
client-side cert validation has never worked is that we've been taking
the A-label from the server's certificate and checking if it matches
the U-label we expect, and of course it never does because we're
comparing strings in different encodings. If we consistently converted
everything to A-labels as soon as possible and kept it that way, then
this bug would never have happened.

What makes it tricky is that on both the client and the server, fixing
this is actually user-visible.

On the client, checking sslsock.server_hostname used to always show a
U-label, but if we stop using U-labels internally then this doesn't
make sense. Fortunately, since this case has never worked at all,
fixing it shouldn't cause any problems.

On the server, the obvious fix would be to start passing
A-label-encoded names to the servername_callback, instead of
U-label-encoded names. Unfortunately, this is a bit trickier, because
this *has* historically worked (AFAIK) for IDNA names, so long as they
didn't use one of the four magic characters who changed meaning
between IDNA 2003 and IDNA 2008. But we do still need to do something.
For example, right now, it's impossible to use the ssl module to
implement a web server at https://straße.de, because incoming
connections will use SNI to say that they expect a cert for
"xn--strae-oqa.de", and then the ssl module will freak out and throw
an exception instead of invo

Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-31 Thread Nathaniel Smith

On Dec 31, 2017 7:37 AM, "Stephen J. Turnbull" <
[email protected]> wrote:

Nathaniel Smith writes:

 > Issue 1: Python's built-in IDNA implementation is wrong (implements
 > IDNA 2003, not IDNA 2008).

Is "wrong" the right word here?  I'll grant you that 2008 is *better*,
but typically in practice versions coexist for years.  Ie, is there no
backward compatibility issue with registries that specified IDNA 2003?

Well, yeah, I was simplifying, but at the least we can say that always and
only using IDNA 2003 certainly isn't right :-). I think in most cases the
preferred way to deal with these kinds of issues is not to carry around an
IDNA 2003 implementation, but instead to use an IDNA 2008 implementation
with the "transitional compatibility" flag enabled in the UTS46
preprocessor? But this is rapidly exceeding my knowledge.

This is another reason why we ought to let users do their own IDNA handling
if they want...

This is not entirely an idle question: I'd like to tool up on the
RFCs, research existing practice (especially in the East/Southeast Asian
registries), and contribute to the implementation if there may be an
issue remaining.  (Interpreting RFCs is something I'm reasonably good
at.)

Maybe this is a good place to start:

https://github.com/kjd/idna/blob/master/README.rst

-n

[Sorry if my quoting is messed up; posting from my phone and Gmail for
Android apparently generates broken text/plain.]
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-31 Thread Nathaniel Smith

On Sun, Dec 31, 2017 at 5:39 PM, Steven D'Aprano  wrote:
> On Sun, Dec 31, 2017 at 09:07:01AM -0800, Nathaniel Smith wrote:
>
>> This is another reason why we ought to let users do their own IDNA handling
>> if they want...
>
> I expect that letting users do their own IDNA handling will correspond
> to not doing any IDNA handling at all.

You did see the words "if they want", right? I'm not talking about
removing the stdlib's default IDNA handling, I'm talking about fixing
the cases where the stdlib goes out of its way to prevent users from
overriding its IDNA handling.

And "users" here is a very broad category; it includes libraries like
requests, twisted, trio, ... that are already doing better IDNA
handling than the stdlib, except in cases where the stdlib actively
prevents it.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

1 2 3 4 5 >

1 - 100 of 441 matches

Mail list logo