Re: [Python-Dev] Python Language Summit at PyCon: Agenda

2013-03-02 Thread Gregory P. Smith
On Thu, Feb 28, 2013 at 1:15 AM, Nick Coghlan  wrote:

> On Thu, Feb 28, 2013 at 1:37 PM, Barry Warsaw  wrote:
> > On Feb 27, 2013, at 11:33 AM, fwierzbi...@gmail.com wrote:
> >>The easy part for Jython is pushing some of our "if is_jython:" stuff
> >>into the appropriate spots in CPython's Lib/.
> >
> > I wonder if there isn't a better way to do this than sprinkling
> is_jython,
> > is_pypy, is_ironpython, is_thenextbigthing all over the code base.  I
> have no
> > bright ideas here, but it seems like a feature matrix would be a better
> way to
> > go than something that assumes a particular Python implementation has a
> > particular feature set (which may change in the future).
>
> Yes, avoiding that kind of thing is a key motivation for
> sys.implementation. Any proposal for "is_jython" blocks should instead
> be reformulated as a proposal for new sys.implementation attributes.
>

I kind of wish there were an assert-like magic "if __debug__:" type of
mechanism behind this so that blocks of code destined solely for a single
interpreter won't be seen in the code objects or .pyc's of non-target
interpreters.

That idea obviously isn't fleshed out but i figure i'd better plant the
seed...

It'd mean smaller code objects and less bloat from constants (docstrings
for one implementation vs another, etc) being in memory. Taken further,
this could even be extended beyond implementations to platforms as we have
some standard library code with alternate definitions within one file for
windows vs posix, etc.

Antoine's point about code like that being untestable by most CPython
developers is valid.  I'd want --with-pydebug builds to disable any parsing
-> code object exclusions to at least make sure its syntax doesn't rot but
that still doesn't _test_ it unless we get someone maintains reliable
buildbots for every implementation using this common stdlib.

-gps
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cffi in stdlib

2013-03-02 Thread Armin Rigo
Hi Gregory,

On Sat, Mar 2, 2013 at 8:40 AM, Gregory P. Smith  wrote:
>
> On Wed, Feb 27, 2013 at 7:57 AM, Eli Bendersky  wrote:
>> So would you say that the main use of the API level is provide an
>> alternative for writing C API code to interface to C libraries. IOW, it's in
>> competition with Swig?
>
> I'd hardly call it competition. The primary language I interface with is C++
> and cffi appears not see that giant elephant in the room

I don't think it's in competition with Swig, which does C++.  There
are certain workloads in which C++ is the elephant in the room; we
don't address such workloads.  If you want some more motivation, the
initial goal was to access the large number of standard Linux/Posix
libraries that are C (or have a C interface), but are too hard to
access for ctypes (macros, partially-documented structure types,
#define for constants, etc.).  For this goal, it works great.

> (it'd need to use clang for parsing if it were going to do that)...

I fear parsing is merely the tip of the iceberg when we talk about
interfacing with C++.


A bientôt,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cffi in stdlib

2013-03-02 Thread Stefan Behnel
Hi,

looks like no-one's taken over the role of the Advocatus Diaboli yet. =)

Maciej Fijalkowski, 26.02.2013 16:13:
> I would like to discuss on the language summit a potential inclusion
> of cffi[1] into stdlib. This is a project Armin Rigo has been working
> for a while, with some input from other developers. It seems that the
> main reason why people would prefer ctypes over cffi these days is
> "because it's included in stdlib", which is not generally the reason I
> would like to hear. Our calls to not use C extensions and to use an
> FFI instead has seen very limited success with ctypes and quite a lot
> more since cffi got released. The API is fairly stable right now with
> minor changes going in and it'll definitely stablize until Python 3.4
> release.

You say that "the API is fairly stable". What about the implementation?
Will users want to install a new version next to the stdlib one in a couple
of months, just because there was a bug in the parser in Python 3.4 that
you still need to support because there's code that depends on it, or
because there is this new feature that is required to make it work with
library X, or ... ? What's the upgrade path in that case? How will you
support this? What long-term guarantees do you give to users of the stdlib
package?

Or, in other words, will the normal fallback import for cffi look like this:

try: import stdlib_cffi
except ImportError: import external_cffi

or will the majority of users end up prefering this order:

try: import external_cffi
except ImportError: import stdlib_cffi


> * Work either at the level of the ABI (Application Binary Interface)
> or the API (Application Programming Interface). Usually, C libraries
> have a specified C API but often not an ABI (e.g. they may document a
> “struct” as having at least these fields, but maybe more). (ctypes
> works at the ABI level, whereas Cython and native C extensions work at
> the API level.)

Ok, so there are cases where you need a C compiler installed in order to
support the API. Which means that it will be a very complicated thing for
users to get working under Windows, for example, which then means that
users are actually best off not using the API-support feature if they want
portable code. Wouldn't it be simpler to target Windows with a binary than
with dynamically compiled C code? Is there a way to translate an API
description into a static ABI description for a known platform ahead of
time, or do I have to implement this myself in a separate ABI code path by
figuring out a suitable ABI description myself?

In which cases would users choose to use the C API support? And, is this
dependency explicit or can I accidentally run into the dependency on a C
compiler for my code without noticing?


> * We try to be complete. For now some C99 constructs are not
> supported, but all C89 should be, including macros (and including
> macro “abuses”, which you can manually wrap in saner-looking C
> functions).

Ok, so the current status actually is that it's *not* complete, and that
future versions will have to catch up in terms of C compatibility. So, why
do you think it's a good time to get it into the stdlib *now*?


> * We attempt to support both PyPy and CPython, with a reasonable path
> for other Python implementations like IronPython and Jython.

You mentioned that it's fast under PyPy and slow under CPython, though.
What would be the reason to use it under CPython then? Some of the projects
that are using it (you named a couple) also have equivalent (or maybe more
or less so) native implementations for CPython already. Do you have any
benchmarks available that compare those to their cffi versions under
CPython? Is the slowdown within any reasonable bounds?

Others have already mentioned the lack of C++ support. It's ok to say that
you deliberately only want to support C, but it's also true that that's a
substantial restriction.

Stefan


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cffi in stdlib

2013-03-02 Thread Armin Rigo
Hi Stefan,

On Sat, Mar 2, 2013 at 10:10 AM, Stefan Behnel  wrote:
> You say that "the API is fairly stable". What about the implementation?
> Will users want to install a new version next to the stdlib one in a couple
> of months,

I think that the implementation is fairly stable as well.  The only
place I can foresee some potential changes is in details like the
location of temporary files, for example, which needs to be discussed
(probably with people from python-dev too) as some point.

> just because there was a bug in the parser in Python 3.4 that
> you still need to support because there's code that depends on it, or
> because there is this new feature that is required to make it work with
> library X, or ... ? What's the upgrade path in that case? How will you
> support this? What long-term guarantees do you give to users of the stdlib
> package?

I think these are general questions for any package that ends up in
the stdlib.  In the case of CFFI, it is now approaching a stability
point.  This is also because we are going to integrate it with the
stdlib of PyPy soon.

Bugs in the parser have not been found so far, but if there is any, we
will treat it like we treat any other bug in the stdlib.  For that
matter, there is actually no obvious solution for the user either: he
generally has to wait for the next micro release to have the bug
fixed.

> Or, in other words, will the normal fallback import for cffi look like this:
>
> try: import stdlib_cffi
> except ImportError: import external_cffi
>
> or will the majority of users end up prefering this order:
>
> try: import external_cffi
> except ImportError: import stdlib_cffi

I would rather drop the external CFFI entirely, or keep it only to
provide backports to older Python versions.  I personally see no
objection to call the stdlib one "cffi" too (but any other name is
fine as well).

> ... Wouldn't it be simpler to target Windows with a binary than
> with dynamically compiled C code? Is there a way to translate an API
> description into a static ABI description for a known platform ahead of
> time, or do I have to implement this myself in a separate ABI code path by
> figuring out a suitable ABI description myself?

No, I believe that you missed this point: when you make "binary"
distributions of a package with setup.py, it precompiles a library for
CFFI too.  So yes, you need a C compiler on machines where you develop
the program, but not on machines where you install it.  It's the same
needs as when writing custom C extension modules by hand.

> In which cases would users choose to use the C API support? And, is this
> dependency explicit or can I accidentally run into the dependency on a C
> compiler for my code without noticing?

A C compiler is clearly required: this is if and only if you call the
function verify(), and pass it arguments that are not the same ones as
the previous time (i.e. it's not in the cache).

>> * We try to be complete. For now some C99 constructs are not
>> supported, but all C89 should be, including macros (and including
>> macro “abuses”, which you can manually wrap in saner-looking C
>> functions).
>
> Ok, so the current status actually is that it's *not* complete, and that
> future versions will have to catch up in terms of C compatibility. So, why
> do you think it's a good time to get it into the stdlib *now*?

To be honest I don't have a precise list of C99 constructs missing.  I
used to know of a few of them, but these were eventually supported.
It is unlikely to get completed, or if it is, a fairly slow process
should be fine --- just like a substantial portion of the stdlib,
which gets occasional updates from one Python version to the next.

> You mentioned that it's fast under PyPy and slow under CPython, though.
> What would be the reason to use it under CPython then?

The reason is just ease of use.  I pretend that it takes less effort
(and little C knowledge), and is less prone to bugs and leaks, to
write a perfectly working prototype of a module to access a random C
library.  I do not pretend that you'll get the top-most performance.
For a lot of cases performance doesn't matter; and when it does, on
CPython, you can really write a C extension module by hand (as long as
you make sure to keep around the CFFI version for use by PyPy).  This
is how I see it, anyway.  The fact that we are busy rewriting existing
native well-tested CPython extensions with CFFI --- this is really
only of use for PyPy.

> Others have already mentioned the lack of C++ support. It's ok to say that
> you deliberately only want to support C, but it's also true that that's a
> substantial restriction.

I agree that it's a restriction, or rather a possible extension that
is not done.  I don't have plans to do it myself.  Please also keep in
mind that we pitch CFFI as a better ctypes, not as the ultimate tool
to access any foreign language.


A bientôt,

Armin.
___
Python-Dev 

[Python-Dev] Possible bug in socket.py: connection reset by peer

2013-03-02 Thread Michal Kawalec
Hello,
I am experiencing an odd infrequent bug in Python 2.7.3 with GIL
enabled. For some files pushed over TCP socket I get 'connection reset
by peer' and clients only receive a randomly long part of the file.

This situation occurs only in ~0.1% of cases but if it happens for a
given file it keeps on always occuring for that file. The server host is
2-cored Linux 3.8.1 on a VMware VM.

The problem is mitigated by adding time.sleep(0.001) just before the
portion of data is being pushed though the socket. It also seems to be
known for a long time [1].

So my question is - is it something that I can expect to be fixed in the
future Python releases?


Michal


[1]
http://stackoverflow.com/questions/441374/why-am-i-seeing-connection-reset-by-peer-error



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Possible bug in socket.py: connection reset by peer

2013-03-02 Thread Antoine Pitrou
On Sat, 02 Mar 2013 12:48:05 +0100
Michal Kawalec  wrote:
> I am experiencing an odd infrequent bug in Python 2.7.3 with GIL
> enabled. For some files pushed over TCP socket I get 'connection reset
> by peer' and clients only receive a randomly long part of the file.

Why do you think it is a bug in Python?
Start by doing a Wireshark capture of your traffic and find out what
really happens.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Disabling string interning for null and single-char causes segfaults

2013-03-02 Thread Nick Coghlan
On Sat, Mar 2, 2013 at 1:24 AM, Stefan Bucur  wrote:
> Hi,
>
> I'm working on an automated bug finding tool that I'm trying to apply on the
> Python interpreter code (version 2.7.3). Because of early prototype
> limitations, I needed to disable string interning in stringobject.c. More
> precisely, I modified the PyString_FromStringAndSize and PyString_FromString
> to no longer check for the null and single-char cases, and create instead a
> new string every time (I can send the patch if needed).
>
> However, after applying this modification, when running "make test" I get a
> segfault in the test___all__ test case.
>
> Before digging deeper into the issue, I wanted to ask here if there are any
> implicit assumptions about string identity and interning throughout the
> interpreter implementation. For instance, are two single-char strings having
> the same content supposed to be identical objects?
>
> I'm assuming that it's either this, or some refcount bug in the interpreter
> that manifests only when certain strings are no longer interned and thus
> have a higher chance to get low refcount values.

In theory, interning is supposed to be a pure optimisation, but it
wouldn't surprise me if there are cases that assume the described
strings are always interned (especially the null string case). Our
test suite would never detect such bugs, as we never disable the
interning.

Whether or not we're interested in fixing such bugs would depend on
the size of the patches needed to address them. From our point of
view, such bugs are purely theoretical (as the assumption is always
valid in an unpatched CPython build), so if the problem is too hard to
diagnose or fix, we're more likely to declare that interning of at
least those kinds of string values is required for correctness when
creating modified versions of CPython.

I'm not sure what kind of analyser you are writing, but if it relates
to the CPython C API, you may be interested in
https://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Merging Jython code into standard Lib [was Re: Python Language Summit at PyCon: Agenda]

2013-03-02 Thread Nick Coghlan
On Fri, Mar 1, 2013 at 6:35 AM, Brett Cannon  wrote:
>
>
>
> On Thu, Feb 28, 2013 at 3:17 PM, fwierzbi...@gmail.com
>  wrote:
>>
>> On Thu, Feb 28, 2013 at 12:00 PM, Antoine Pitrou 
>> wrote:
>> > IMHO, we should remove the plat-* directories, they are completely
>> > unmaintained, undocumented, and serve no useful purpose.
>> Oh I didn't know that - so definitely adding to that is right out :)
>>
>> Really for cases like Jython's zlib.py (no useful code for CPython) I
>> don't have any trouble keeping them entirely in Jython. It just would
>> have been fun to delete our Lib/ :)
>>
>> It would be nice in this particular case if there was a zlib.py that
>> imported _zlib -- then it would be easy to shim in Jython's version,
>> whether it is written in a .py file or in Java.
>
>
> That should be fine as that is what we already do for accelerator modules
> anyway. If you want to work towards having an equivalent of CPython's
> Modules/ directory so you can ditch your custom Lib/ modules by treating
> your specific code as accelerators I think we can move towards that
> solution.

I'd go further and say we *should* move to that solution.

Here's an interesting thought: for pure C modules without a Python
implementation, we can migrate to this architecture even *without*
creating pure Python equivalents. All we shou;d have to do is change
the test of the pure Python version to be that the module *can't be
imported* without the accelerator, rather than the parallel tests that
we normally implement when there's a pure Python alternative to the
accelerated version. (There would likely still be some mucking about
to ensure robust pickle compatibility, since that leaks implementation
details about exact module names if you're not careful)

PyPy, Jython, IronPython would then have RPython, Java, C# versions,
while CPython has a C version, and the test suite should work
regardless. (If PyPy have equivalents in Python, they can either push
them upstream, overwrite the "import the accelerator" version).

Cheers,
Nick.


-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Merging Jython code into standard Lib [was Re: Python Language Summit at PyCon: Agenda]

2013-03-02 Thread Antoine Pitrou
On Sun, 3 Mar 2013 01:17:35 +1000
Nick Coghlan  wrote:
> 
> I'd go further and say we *should* move to that solution.
> 
> Here's an interesting thought: for pure C modules without a Python
> implementation, we can migrate to this architecture even *without*
> creating pure Python equivalents. All we shou;d have to do is change
> the test of the pure Python version to be that the module *can't be
> imported* without the accelerator, rather than the parallel tests that
> we normally implement when there's a pure Python alternative to the
> accelerated version. (There would likely still be some mucking about
> to ensure robust pickle compatibility, since that leaks implementation
> details about exact module names if you're not careful)

What benefit would this have?

Current situation: each Python implementation has its own
implementation of the zlib module (as a C module for CPython, etc.).

New situation: all Python implementations share a single, mostly empty,
zlib.py file. Each Python implementation has its own implementation of
the _zlib module (as a C module for CPython, etc.) which is basically
the same as the former zlib module.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Disabling string interning for null and single-char causes segfaults

2013-03-02 Thread Antoine Pitrou
On Fri, 1 Mar 2013 16:24:42 +0100
Stefan Bucur  wrote:
> 
> However, after applying this modification, when running "make test" I get a
> segfault in the test___all__ test case.
> 
> Before digging deeper into the issue, I wanted to ask here if there are any
> implicit assumptions about string identity and interning throughout the
> interpreter implementation. For instance, are two single-char strings
> having the same content supposed to be identical objects?

>From a language POV, no, but inside a specific interpreter such as
CPython it may be a reasonable expectation.

> I'm assuming that it's either this, or some refcount bug in the interpreter
> that manifests only when certain strings are no longer interned and thus
> have a higher chance to get low refcount values.

Indeed, if it's a real bug it would be nice to get it fixed :-)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Planning on removing cache invalidation for file finders

2013-03-02 Thread Nick Coghlan
On Sat, Mar 2, 2013 at 12:31 PM, Brett Cannon  wrote:
> As of right now, importlib keeps a cache of what is in a directory for its
> file finder instances. It uses mtime on the directory to try and detect when
> it has changed to know when to refresh the cache. But thanks to mtime
> granularities of up to a second, it is only a heuristic that isn't totally
> reliable, especially across filesystems on different OSs.
>
> This is why importlib.invalidate_caches() came into being. If you look in
> our test suite you will see it peppered around where a module is created on
> the fly to make sure that mtime granularity isn't a problem. But it somewhat
> negates the point of the mtime heuristic when you have to make this function
> call regardless to avoid potential race conditions.
>
> http://bugs.python.org/issue17330 originally suggested trying to add another
> heuristic to determine when to invalidate the cache. But even with the
> suggestion it's still iffy and in no way foolproof.
>
> So the current idea is to just drop the invalidation heuristic and go
> full-blown reliance on calls to importlib.invalidate_caches() as necessary.
> This makes code more filesystem-agnostic and protects people from
> hard-to-detect errors when importlib only occasionally doesn't detect new
> modules (I know it drove me nuts for a while when the buildbots kept failing
> sporadically and only on certain OSs).
>
> I would have just made the change but Antoine wanted it brought up here
> first to make sure that no one was heavily relying on the current setup. So
> if you have a good, legitimate reason to keep the reliance on mtime for
> cache invalidation please speak up. But since the common case will never
> care about any of this (how many people generate modules on the fly to being
> with?) and to be totally portable you need to call
> importlib.invalidate_caches() anyway, it's going to take a lot to convince
> me to keep it.

I think you should keep it. A long running service that periodically
scans the importers for plugins doesn't care if modules take a few
extra seconds to show up, it just wants to see them eventually.
Installers (or filesystem copy or move operations!) have no way to
inform arbitrary processes that new files have been added.

It's that case where the process that added the modules is separate
from the process scanning for them, and the communication is one way,
where the heuristic is important. Explicit invalidation only works
when they're the *same* process, or when they're closely coupled so
the adding process can tell the scanning process to invalidate the
caches (our test suite is mostly the former although there are a
couple of cases of the latter).

I have no problem with documenting invalidate_caches() as explicitly
required for correctness when writing new modules which are to be read
back by the same process, or when there is a feedback path between two
processes that may be confusing if the cache invalidation is delayed.
The implicit invalidation is only needed to pick up modules written by
*another* process.

In addition, it may be appropriate for importlib to offer a
"write_module" method that accepts (module name, target path,
contents). This would:

1. Allow in-process caches to be invalidated implicitly and
selectively when new modules are created
2. Allow importers to abstract write access in addition to read access
3. Allow the import system to complain at time of writing if the
desired module name and target path don't actually match given the
current import system state.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Planning on removing cache invalidation for file finders

2013-03-02 Thread Nick Coghlan
On Sun, Mar 3, 2013 at 1:36 AM, Nick Coghlan  wrote:
> It's that case where the process that added the modules is separate
> from the process scanning for them, and the communication is one way,
> where the heuristic is important. Explicit invalidation only works
> when they're the *same* process, or when they're closely coupled so
> the adding process can tell the scanning process to invalidate the
> caches (our test suite is mostly the former although there are a
> couple of cases of the latter).

s/are/may be/ (I don't actually remember if there are or not off the
top of my head)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Merging Jython code into standard Lib [was Re: Python Language Summit at PyCon: Agenda]

2013-03-02 Thread Brett Cannon
On Sat, Mar 2, 2013 at 10:28 AM, Antoine Pitrou  wrote:

> On Sun, 3 Mar 2013 01:17:35 +1000
> Nick Coghlan  wrote:
> >
> > I'd go further and say we *should* move to that solution.
> >
> > Here's an interesting thought: for pure C modules without a Python
> > implementation, we can migrate to this architecture even *without*
> > creating pure Python equivalents. All we shou;d have to do is change
> > the test of the pure Python version to be that the module *can't be
> > imported* without the accelerator, rather than the parallel tests that
> > we normally implement when there's a pure Python alternative to the
> > accelerated version. (There would likely still be some mucking about
> > to ensure robust pickle compatibility, since that leaks implementation
> > details about exact module names if you're not careful)
>
> What benefit would this have?
>
> Current situation: each Python implementation has its own
> implementation of the zlib module (as a C module for CPython, etc.).
>
> New situation: all Python implementations share a single, mostly empty,
> zlib.py file. Each Python implementation has its own implementation of
> the _zlib module (as a C module for CPython, etc.) which is basically
> the same as the former zlib module.
>

Bare minimum? They all share the same module docstring. But it could be
extended to explicitly import only the public API into zlib.py, helping to
prevent leaking interpreter-specific APIs by accident (obviously would
still be available off of _zlib if people wanted them).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Planning on removing cache invalidation for file finders

2013-03-02 Thread Brett Cannon
On Sat, Mar 2, 2013 at 10:36 AM, Nick Coghlan  wrote:

> On Sat, Mar 2, 2013 at 12:31 PM, Brett Cannon  wrote:
> > As of right now, importlib keeps a cache of what is in a directory for
> its
> > file finder instances. It uses mtime on the directory to try and detect
> when
> > it has changed to know when to refresh the cache. But thanks to mtime
> > granularities of up to a second, it is only a heuristic that isn't
> totally
> > reliable, especially across filesystems on different OSs.
> >
> > This is why importlib.invalidate_caches() came into being. If you look in
> > our test suite you will see it peppered around where a module is created
> on
> > the fly to make sure that mtime granularity isn't a problem. But it
> somewhat
> > negates the point of the mtime heuristic when you have to make this
> function
> > call regardless to avoid potential race conditions.
> >
> > http://bugs.python.org/issue17330 originally suggested trying to add
> another
> > heuristic to determine when to invalidate the cache. But even with the
> > suggestion it's still iffy and in no way foolproof.
> >
> > So the current idea is to just drop the invalidation heuristic and go
> > full-blown reliance on calls to importlib.invalidate_caches() as
> necessary.
> > This makes code more filesystem-agnostic and protects people from
> > hard-to-detect errors when importlib only occasionally doesn't detect new
> > modules (I know it drove me nuts for a while when the buildbots kept
> failing
> > sporadically and only on certain OSs).
> >
> > I would have just made the change but Antoine wanted it brought up here
> > first to make sure that no one was heavily relying on the current setup.
> So
> > if you have a good, legitimate reason to keep the reliance on mtime for
> > cache invalidation please speak up. But since the common case will never
> > care about any of this (how many people generate modules on the fly to
> being
> > with?) and to be totally portable you need to call
> > importlib.invalidate_caches() anyway, it's going to take a lot to
> convince
> > me to keep it.
>
> I think you should keep it. A long running service that periodically
> scans the importers for plugins doesn't care if modules take a few
> extra seconds to show up, it just wants to see them eventually.
> Installers (or filesystem copy or move operations!) have no way to
> inform arbitrary processes that new files have been added.
>

But if they are doing the scan they can also easily invalidate the caches
before performing the scan.


>
> It's that case where the process that added the modules is separate
> from the process scanning for them, and the communication is one way,
> where the heuristic is important. Explicit invalidation only works
> when they're the *same* process, or when they're closely coupled so
> the adding process can tell the scanning process to invalidate the
> caches (our test suite is mostly the former although there are a
> couple of cases of the latter).
>

That's only true if the scanning process has no idea that another process
is adding modules. If there is an expectation then it doesn't matter who
added the file as you just assume cache invalidation is necessary.


>
> I have no problem with documenting invalidate_caches() as explicitly
> required for correctness when writing new modules which are to be read
> back by the same process, or when there is a feedback path between two
> processes that may be confusing if the cache invalidation is delayed.
>

Already documented as such.


> The implicit invalidation is only needed to pick up modules written by
> *another* process.
>
> In addition, it may be appropriate for importlib to offer a
> "write_module" method that accepts (module name, target path,
> contents). This would:
>
> 1. Allow in-process caches to be invalidated implicitly and
> selectively when new modules are created
>

I don't think that's necessary. If people don't want to blindly clear all
caches for a file they can write the file, search the keys in
sys.path_importer_cache for the longest prefix for the newly created file,
and then call the invalidate_cache() method on that explicit finder.

2. Allow importers to abstract write access in addition to read access
>

That's heading down the virtual filesystem path which I don't want to go
down any farther than I have to. The API is big enough as it is and the
more entangled it gets the harder it is to change/fix, especially with the
finders having a nice, small API compared to the loaders.


> 3. Allow the import system to complain at time of writing if the
> desired module name and target path don't actually match given the
> current import system state.
>

I think that's more checking than necessary for a use case that isn't that
common.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archiv

Re: [Python-Dev] Python Language Summit at PyCon: Agenda

2013-03-02 Thread Nick Coghlan
On Fri, Mar 1, 2013 at 9:39 AM, Doug Hellmann  wrote:
>
> On Feb 27, 2013, at 11:51 AM, Michael Foord wrote:
>
>> Hello all,
>>
>> PyCon, and the Python Language Summit, is nearly upon us. We have a good 
>> number of people confirmed to attend. If you are intending to come to the 
>> language summit but haven't let me know please do so.
>>
>> The agenda of topics for discussion so far includes the following:
>>
>> * A report on pypy status - Maciej and Armin
>> * Jython and IronPython status reports - Dino / Frank
>> * Packaging (Doug Hellmann and Monty Taylor at least)
>
> Since the time I suggested we add packaging to the agenda, Nick has set up a 
> separate summit meeting for Friday evening. I don't know if it makes sense to 
> leave this on the agenda for Wednesday or not.
>
> Nick, what do you think?

I think it's definitely worth my taking some time to explain my goals
for the Friday night session, and some broader things in terms of
where I'd like to see packaging going, but a lot of the key packaging
people aren't involved in Python language development *per se*, and
hence won't be at the language summit.

There's also one controversial point that *does* need to be raised at
the summit: I would like to make distutils-sig the true authority for
packaging standards, so we can stop cross-posting PEPs intended to
apply to packaging in *current* versions of Python to python-dev. The
split discussions suck, and most of the people that need to be
convinced in order for packaging standards to be supported in current
versions of Python aren't on python-dev, since it's a tooling issue
rather than a language design issue. Standard lib support is necessary
in the long run to provide a good "batteries included" experience, but
it's *not* the way to create the batteries in the first place. Until
these standards have been endorsed by the authors of *existing*
packaging tools, proposing them for stdlib addition is premature, but
has been perceived as necessary in the past due to the confused power
structure.

This means that those core developers that want a say in the future
direction of packaging and distribution of Python software would need
to be actively involved in the ongoing discussions on distutils-sig,
rather than relying on being given an explicit invitation to weigh in
at the last minute through a thread (or threads) on python-dev. The
requirement that BDFL-delegates for packaging and distribution related
PEPs also be experienced core developers will remain, however, as
"suitable for future stdlib inclusion" is an important overarching
requirement for packaging and distribution standards. Such delegates
will just be expected to participate actively in distutils-sig *as
well as* python-dev.

Proposals for *actual* standard library updates (to bring it into line
with updated packaging standards) would still be subject to python-dev
discussion and authority (and would *not* have their Discussions-To
header set). Such discussions aren't particularly relevant to most of
the packaging tool developers, since the standard library version
isn't updated frequently enough to be useful to them, and also isn't
available on older Python releases, so python-dev is a more
appropriate venue from both perspectives.

At the moment, python-dev, catalog-sig and distutils-sig create an
awkward trinity where decision making authority and the appropriate
venues for discussion are grossly unclear. I consider this to be one
of the key reasons that working on packaging issues has quite a high
incidence of developer burnout - it's hard to figure out who needs to
be convinced of what, so it's easy for the frustration levels to reach
the "this just isn't worth the hassle" stage (especially when trying
to bring python-dev members up to speed on discussions that may have
taken months on distutils-sig, and when many of the details are
awkward compromises forced by the need to support *existing* tools and
development processes on older versions of Python). Under my proposal,
the breakdown would be slightly clearer:

distutils-sig: overall authority for packaging and distribution
related standards, *including* the interfaces between index servers
(such as PyPI) and automated tools. If a PEP has "Discussions-To" set
to distutils-sig, announcements of new PEPs, new versions of those
PEPs, *and* their acceptance or rejection should be announced there,
and *not* on python-dev. The "Resolution" header will thus point to a
distutils-sig post rather than a python-dev one. distutils-sig will
focus on solutions that work for *current* versions of Python, while
keeping in mind the need for future stdlib support.

python-dev: authority over stdlib support for packaging and
distribution standards, and the "batteries included" experience of
interacting with those standards. Until a next generation distribution
infrastructure is firmly established (which may involve years of
running the legacy infrastructure and the next generation metadata 2.x
bas

Re: [Python-Dev] Python Language Summit at PyCon: Agenda

2013-03-02 Thread Nick Coghlan
On Sat, Mar 2, 2013 at 6:01 PM, Gregory P. Smith  wrote:
> It'd mean smaller code objects and less bloat from constants (docstrings for
> one implementation vs another, etc) being in memory. Taken further, this
> could even be extended beyond implementations to platforms as we have some
> standard library code with alternate definitions within one file for windows
> vs posix, etc.

To plant seeds in the opposite direction, as you're considering this,
I suggest looking at:
- environment markers in PEP 345 and 426 for conditional selection
based on a constrained set of platform data
- compatibility tags in PEP 425 (and consider how they could be used
in relation to __pycache__ and bytecode-only distribution of platform
specific files)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Language Summit at PyCon: Agenda

2013-03-02 Thread Nick Coghlan
On Fri, Mar 1, 2013 at 7:41 PM, Stefan Behnel  wrote:
> Michael Foord, 27.02.2013 17:51:
> It's also true that many of the topics above aren't really interesting for
> us, because we just inherit them with CPython, e.g. stdlib changes.
> Packaging is only relevant as far as it impacts the distribution of binary
> extensions, and the main changes appear to be outside of that area (which
> doesn't mean it's not truly wonderful that they are happening, Python
> packaging has seen a lot of great improvements during the last years and
> I'm very happy to see it getting even better).

I'm puzzled by this one. Did you leave out PEP 427 (the wheel format),
because it's already approved, and hence not likely to be discussed
much at the summit, or because you don't consider it to impact the
distribution of binary extensions (which would be rather odd, given
the nature of the PEP and the wheel format...)

>
> Interpreter initialisation would be interesting and Cython could
> potentially help in some spots here by making code easier to maintain and
> optimise, for example. We've had this discussion for the importlib
> bootstrapping and I'm sure there's more that could be done. It's sad to see
> so much C-level work go into areas that really don't need to be that 
> low-level.

Cython's notion of embedding is the exact opposite of CPython's, so
I'm not at all clear on how Cython could help with PEP 432 at all.

> I'm not so happy with the argument clinic, but that's certainly also
> because I'm biased. I've written the argument unpacking code for Cython
> some years ago, so it's not surprising that I'm quite happy with that and
> fail to see the need for a totally new DSL *and* a totally new
> implementation, especially with its mapping to the slowish ParseTuple*()
> C-API functions. I've also not seen a good argument why the existing Py3
> function signatures can't do what the proposed DSL tries to achieve. They'd
> at least make it clear that the intention is to make things more
> Python-like, and would at the same time provide the documentation.

That's why Stefan Krah is writing a competing PEP - a number of us
already agree with you, and think the case needs to be made for
choosing something completely different like Argument Clinic
(especially given Guido's expressed tolerance for the idea of "/" as a
possible marker to indicate that the preceding parameters only support
positional arguments - that was in the context of Python discussion
where it was eventually deemed "not necessary", but becomes
interesting again in a C API signature discussion)

> And I'd really like to see a CPython summit
> happen at some point. There's so much interesting stuff going on in that
> area that it's worth getting some people together to move these things 
> forward.

Yes, a CPython runtime summit some year would be interesting.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Language Summit at PyCon: Agenda

2013-03-02 Thread Maciej Fijalkowski
>> And I'd really like to see a CPython summit
>> happen at some point. There's so much interesting stuff going on in that
>> area that it's worth getting some people together to move these things 
>> forward.
>
> Yes, a CPython runtime summit some year would be interesting.
>
> Cheers,
> Nick.

I don't see why CPython-specific stuff can't be discussed on the
language summit. After all, everyone can be not interested in a topic
X or topic Y. I would be even more than happy to contribute my
knowledge about building VMs w.r.t. CPython implementation as much as
I could.

Cheers,
fijal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Planning on removing cache invalidation for file finders

2013-03-02 Thread Nick Coghlan
On Sun, Mar 3, 2013 at 2:16 AM, Brett Cannon  wrote:
> On Sat, Mar 2, 2013 at 10:36 AM, Nick Coghlan  wrote:
>> I think you should keep it. A long running service that periodically
>> scans the importers for plugins doesn't care if modules take a few
>> extra seconds to show up, it just wants to see them eventually.
>> Installers (or filesystem copy or move operations!) have no way to
>> inform arbitrary processes that new files have been added.
>
>
> But if they are doing the scan they can also easily invalidate the caches
> before performing the scan.

"I just upgraded to Python 3.4, and now my server process isn't see new plugins"

That's a major backwards compatibility breach, and hence clearly
unacceptable in my view. Even the relatively *minor* compatibility
breach of becoming dependent on the filesystem timestamp resolution
for picking up added modules, creating a race condition between
writing the file and reading it back through the import system, has
caused people grief. When you're in a hole, the first thing to do is
to *stop digging*.

You can deprecate the heuristic if you want (and can figure out how),
but a definite -1 on removing it without at least the usual
deprecation period for backwards incompatible changes.

It may also be worth tweaking the wording of the upgrade note in the
What's New to mention the need to always invalidate the cache before
scanning for new modules if you want to reliably pick up new modules
created since the application started (at the moment the note really
only mentions it as something to do after *creating* a new module).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Language Summit at PyCon: Agenda

2013-03-02 Thread Stefan Behnel
Hi Nick,

thanks for the feedback.

Nick Coghlan, 02.03.2013 17:58:
> On Fri, Mar 1, 2013 at 7:41 PM, Stefan Behnel wrote:
>> Michael Foord, 27.02.2013 17:51:
>> It's also true that many of the topics above aren't really interesting for
>> us, because we just inherit them with CPython, e.g. stdlib changes.
>> Packaging is only relevant as far as it impacts the distribution of binary
>> extensions, and the main changes appear to be outside of that area (which
>> doesn't mean it's not truly wonderful that they are happening, Python
>> packaging has seen a lot of great improvements during the last years and
>> I'm very happy to see it getting even better).
> 
> I'm puzzled by this one. Did you leave out PEP 427 (the wheel format),
> because it's already approved, and hence not likely to be discussed
> much at the summit, or because you don't consider it to impact the
> distribution of binary extensions (which would be rather odd, given
> the nature of the PEP and the wheel format...)

I admit that the wheel format has been sailing mostly below my radar (I
guess much of the discussion about it is buried somewhere in the distutils
SIG archives?), but the last time it started blinking brightly enough to
have me take a look at the PEP, I didn't really see anything that was
relevant enough to Cython to pay much attention or even comment on it. As I
understand it, it's almost exclusively about naming and metadata. Cython
compiled extensions are in no way different from plain C extensions wrt
packaging. What works for those will work for Cython just fine.

Does it imply any changes in the build system that I should be aware of?
Cython usually just runs as a preprocessor for distutils extensions, before
even calling into setup(). The rest is just a plain old distutils extension
build.


>> Interpreter initialisation would be interesting and Cython could
>> potentially help in some spots here by making code easier to maintain and
>> optimise, for example. We've had this discussion for the importlib
>> bootstrapping and I'm sure there's more that could be done. It's sad to see
>> so much C-level work go into areas that really don't need to be that 
>> low-level.
> 
> Cython's notion of embedding is the exact opposite of CPython's, so
> I'm not at all clear on how Cython could help with PEP 432 at all.

I wasn't thinking about embedding CPython in a Cython compiled program.
That would appear like a rather strange setup here.

In the context of importlib, I proposed compiling init time Python code
into statically linked extension modules in order to speed it up and make
it independent of the parser and interpreter, as an alternative to freezing
it (which requires a working VM already and implies interpretation
overhead). I agree that Cython can't help in most of the early low-level
runtime bootstrap process, but once a minimum runtime is available, the
more high-level parts of the initialisation could be done in compiled
Python code, which other implementations might be able to reuse.


>> I'm not so happy with the argument clinic, but that's certainly also
>> because I'm biased. I've written the argument unpacking code for Cython
>> some years ago, so it's not surprising that I'm quite happy with that and
>> fail to see the need for a totally new DSL *and* a totally new
>> implementation, especially with its mapping to the slowish ParseTuple*()
>> C-API functions. I've also not seen a good argument why the existing Py3
>> function signatures can't do what the proposed DSL tries to achieve. They'd
>> at least make it clear that the intention is to make things more
>> Python-like, and would at the same time provide the documentation.
> 
> That's why Stefan Krah is writing a competing PEP - a number of us
> already agree with you, and think the case needs to be made for
> choosing something completely different like Argument Clinic

I'll happily provide my feedback to that approach. It might also have a
positive impact on the usage of Py3 argument annotations, which I think
merit some more visibility and "useful use cases".


> (especially given Guido's expressed tolerance for the idea of "/" as a
> possible marker to indicate that the preceding parameters only support
> positional arguments - that was in the context of Python discussion
> where it was eventually deemed "not necessary", but becomes
> interesting again in a C API signature discussion)

I've not really had that need myself yet, but I remember thinking of it at
least once while writing Cython's argument unpacking code. I think it would
get rid of a currently existing asymmetry between positional arguments and
keyword(-only) arguments, and would remove the risk of naming collisions
with positional arguments, most notably when **kwargs is used. And yes, I
agree that it would be most interesting for C signatures, just like kwonly
arguments are really handy there. It might not be all too hard to write up
a prototype in Cython. And I should be able to find a coupl

Re: [Python-Dev] Python Language Summit at PyCon: Agenda

2013-03-02 Thread Stefan Krah
Stefan Behnel  wrote:
> >> I'm not so happy with the argument clinic, but that's certainly also
> >> because I'm biased. I've written the argument unpacking code for Cython
> >> some years ago, so it's not surprising that I'm quite happy with that and
> >> fail to see the need for a totally new DSL *and* a totally new
> >> implementation, especially with its mapping to the slowish ParseTuple*()
> >> C-API functions. I've also not seen a good argument why the existing Py3
> >> function signatures can't do what the proposed DSL tries to achieve. They'd
> >> at least make it clear that the intention is to make things more
> >> Python-like, and would at the same time provide the documentation.
> > 
> > That's why Stefan Krah is writing a competing PEP - a number of us
> > already agree with you, and think the case needs to be made for
> > choosing something completely different like Argument Clinic
> 
> I'll happily provide my feedback to that approach. It might also have a
> positive impact on the usage of Py3 argument annotations, which I think
> merit some more visibility and "useful use cases".


BTW, I think so far no one has stepped forward to implement the custom
argument handlers. I've looked at Cython output and, as you say, most of
it is there already.

Is it possible to write a minimal version of the code generator that just
produces the argument handling code?

Approximately, how many lines of code would we be talking about?



Stefan Krah



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Disabling string interning for null and single-char causes segfaults

2013-03-02 Thread Terry Reedy

On 3/2/2013 10:08 AM, Nick Coghlan wrote:

On Sat, Mar 2, 2013 at 1:24 AM, Stefan Bucur  wrote:

Hi,

I'm working on an automated bug finding tool that I'm trying to apply on the
Python interpreter code (version 2.7.3). Because of early prototype
limitations, I needed to disable string interning in stringobject.c. More
precisely, I modified the PyString_FromStringAndSize and PyString_FromString
to no longer check for the null and single-char cases, and create instead a
new string every time (I can send the patch if needed).

However, after applying this modification, when running "make test" I get a
segfault in the test___all__ test case.

Before digging deeper into the issue, I wanted to ask here if there are any
implicit assumptions about string identity and interning throughout the
interpreter implementation. For instance, are two single-char strings having
the same content supposed to be identical objects?

I'm assuming that it's either this, or some refcount bug in the interpreter
that manifests only when certain strings are no longer interned and thus
have a higher chance to get low refcount values.


In theory, interning is supposed to be a pure optimisation, but it
wouldn't surprise me if there are cases that assume the described
strings are always interned (especially the null string case). Our
test suite would never detect such bugs, as we never disable the
interning.


Since it required patching functions rather than a configuration switch, 
it literally seems not be a supported option. If so, I would not 
consider it a bug for CPython to use the assumption of interning to run 
faster and I don't think it should be slowed down if that would be 
necessary to remove the assumption. (This is all assuming that the 
problem is not just a ref count bug.)


Stefan's question was about 2.7. I am just curious: does 3.3 still 
intern (some) unicode chars? Did the 256 interned bytes of 2.x carry 
over to 3.x?



Whether or not we're interested in fixing such bugs would depend on
the size of the patches needed to address them. From our point of
view, such bugs are purely theoretical (as the assumption is always
valid in an unpatched CPython build), so if the problem is too hard to
diagnose or fix, we're more likely to declare that interning of at
least those kinds of string values is required for correctness when
creating modified versions of CPython.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Disabling string interning for null and single-char causes segfaults

2013-03-02 Thread Stefan Bucur
On Sat, Mar 2, 2013 at 4:08 PM, Nick Coghlan  wrote:
> On Sat, Mar 2, 2013 at 1:24 AM, Stefan Bucur  wrote:
>> Hi,
>>
>> I'm working on an automated bug finding tool that I'm trying to apply on the
>> Python interpreter code (version 2.7.3). Because of early prototype
>> limitations, I needed to disable string interning in stringobject.c. More
>> precisely, I modified the PyString_FromStringAndSize and PyString_FromString
>> to no longer check for the null and single-char cases, and create instead a
>> new string every time (I can send the patch if needed).
>>
>> However, after applying this modification, when running "make test" I get a
>> segfault in the test___all__ test case.
>>
>> Before digging deeper into the issue, I wanted to ask here if there are any
>> implicit assumptions about string identity and interning throughout the
>> interpreter implementation. For instance, are two single-char strings having
>> the same content supposed to be identical objects?
>>
>> I'm assuming that it's either this, or some refcount bug in the interpreter
>> that manifests only when certain strings are no longer interned and thus
>> have a higher chance to get low refcount values.
>
> In theory, interning is supposed to be a pure optimisation, but it
> wouldn't surprise me if there are cases that assume the described
> strings are always interned (especially the null string case). Our
> test suite would never detect such bugs, as we never disable the
> interning.

I understand. In this case, I'll further investigate the issue, and
see what exactly is the cause of the crash.

>
> Whether or not we're interested in fixing such bugs would depend on
> the size of the patches needed to address them. From our point of
> view, such bugs are purely theoretical (as the assumption is always
> valid in an unpatched CPython build), so if the problem is too hard to
> diagnose or fix, we're more likely to declare that interning of at
> least those kinds of string values is required for correctness when
> creating modified versions of CPython.
>
> I'm not sure what kind of analyser you are writing, but if it relates
> to the CPython C API, you may be interested in
> https://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html

That's quite a neat tool, I didn't know about it! I guess that would
have saved me many hours of debugging obscure refcount bugs in my own
Python extensions :)

In any case, my analysis tool aims to find bugs in Python programs,
not in the CPython implementation itself. It works by performing
symbolic execution [1] on the Python interpreter, while it is
executing the target Python program. This means that the Python
interpreter memory space contains symbolic expressions (i.e.,
mathematical formulas over the program input) instead of "concrete"
values.

The interned strings are pesky for symbolic execution because the
PyObject* pointer allocated when creating an interned string depends
on the string contents, e.g., if the contents are already interned,
the old pointer is returned, otherwise a new object is created. So the
pointer itself becomes "symbolic", i.e., dependant on the input data,
which makes the analysis much more complicated.

Stefan

[1] http://en.wikipedia.org/wiki/Symbolic_execution
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Disabling string interning for null and single-char causes segfaults

2013-03-02 Thread Stefan Bucur
On Sat, Mar 2, 2013 at 4:31 PM, Antoine Pitrou  wrote:
> On Fri, 1 Mar 2013 16:24:42 +0100
> Stefan Bucur  wrote:
>>
>> However, after applying this modification, when running "make test" I get a
>> segfault in the test___all__ test case.
>>
>> Before digging deeper into the issue, I wanted to ask here if there are any
>> implicit assumptions about string identity and interning throughout the
>> interpreter implementation. For instance, are two single-char strings
>> having the same content supposed to be identical objects?
>
> From a language POV, no, but inside a specific interpreter such as
> CPython it may be a reasonable expectation.
>
>> I'm assuming that it's either this, or some refcount bug in the interpreter
>> that manifests only when certain strings are no longer interned and thus
>> have a higher chance to get low refcount values.
>
> Indeed, if it's a real bug it would be nice to get it fixed :-)

By the way, in that case, what would be the best way to debug such
type of ref count errors? I recently ran across this document [1],
which kind of applies to debugging focused on newly introduced code.
But when some changes potentially impact a good fraction of the
interpreter, where should I look first?

I'm asking since I re-ran the failing test with gdb, and the segfault
seems to occur when invoking the kill() syscall, so the error seems to
manifest at some later point than when the faulty code is executed.

Stefan

[1] http://www.python.org/doc/essays/refcnt/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Disabling string interning for null and single-char causes segfaults

2013-03-02 Thread Lukas Lueg
Debugging a refcount bug? Good. Out of the door, line on the left, one
cross each.


2013/3/2 Stefan Bucur 

> On Sat, Mar 2, 2013 at 4:31 PM, Antoine Pitrou 
> wrote:
> > On Fri, 1 Mar 2013 16:24:42 +0100
> > Stefan Bucur  wrote:
> >>
> >> However, after applying this modification, when running "make test" I
> get a
> >> segfault in the test___all__ test case.
> >>
> >> Before digging deeper into the issue, I wanted to ask here if there are
> any
> >> implicit assumptions about string identity and interning throughout the
> >> interpreter implementation. For instance, are two single-char strings
> >> having the same content supposed to be identical objects?
> >
> > From a language POV, no, but inside a specific interpreter such as
> > CPython it may be a reasonable expectation.
> >
> >> I'm assuming that it's either this, or some refcount bug in the
> interpreter
> >> that manifests only when certain strings are no longer interned and thus
> >> have a higher chance to get low refcount values.
> >
> > Indeed, if it's a real bug it would be nice to get it fixed :-)
>
> By the way, in that case, what would be the best way to debug such
> type of ref count errors? I recently ran across this document [1],
> which kind of applies to debugging focused on newly introduced code.
> But when some changes potentially impact a good fraction of the
> interpreter, where should I look first?
>
> I'm asking since I re-ran the failing test with gdb, and the segfault
> seems to occur when invoking the kill() syscall, so the error seems to
> manifest at some later point than when the faulty code is executed.
>
> Stefan
>
> [1] http://www.python.org/doc/essays/refcnt/
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/lukas.lueg%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Planning on removing cache invalidation for file finders

2013-03-02 Thread Brett Cannon
On Sat, Mar 2, 2013 at 12:24 PM, Nick Coghlan  wrote:

> On Sun, Mar 3, 2013 at 2:16 AM, Brett Cannon  wrote:
> > On Sat, Mar 2, 2013 at 10:36 AM, Nick Coghlan 
> wrote:
> >> I think you should keep it. A long running service that periodically
> >> scans the importers for plugins doesn't care if modules take a few
> >> extra seconds to show up, it just wants to see them eventually.
> >> Installers (or filesystem copy or move operations!) have no way to
> >> inform arbitrary processes that new files have been added.
> >
> >
> > But if they are doing the scan they can also easily invalidate the caches
> > before performing the scan.
>
> "I just upgraded to Python 3.4, and now my server process isn't see new
> plugins"
>
> That's a major backwards compatibility breach, and hence clearly
> unacceptable in my view. Even the relatively *minor* compatibility
> breach of becoming dependent on the filesystem timestamp resolution
> for picking up added modules, creating a race condition between
> writing the file and reading it back through the import system, has
> caused people grief. When you're in a hole, the first thing to do is
> to *stop digging*.
>
> You can deprecate the heuristic if you want (and can figure out how),
> but a definite -1 on removing it without at least the usual
> deprecation period for backwards incompatible changes.
>

That part is easy: ImportWarning still exists so simply continuing to check
the directory and noticing when a difference exists that affects subsequent
imports and then raising the warning will handle that.


>
> It may also be worth tweaking the wording of the upgrade note in the
> What's New to mention the need to always invalidate the cache before
> scanning for new modules if you want to reliably pick up new modules
> created since the application started (at the moment the note really
> only mentions it as something to do after *creating* a new module).
>
>
As of right now with the check that's all that is needed, but yes, if the
deprecation does occur it would be worth changing it.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Disabling string interning for null and single-char causes segfaults

2013-03-02 Thread Antoine Pitrou
On Sat, 2 Mar 2013 22:13:56 +0100
Stefan Bucur  wrote:

> On Sat, Mar 2, 2013 at 4:31 PM, Antoine Pitrou  wrote:
> > On Fri, 1 Mar 2013 16:24:42 +0100
> > Stefan Bucur  wrote:
> >>
> >> However, after applying this modification, when running "make test" I get a
> >> segfault in the test___all__ test case.
> >>
> >> Before digging deeper into the issue, I wanted to ask here if there are any
> >> implicit assumptions about string identity and interning throughout the
> >> interpreter implementation. For instance, are two single-char strings
> >> having the same content supposed to be identical objects?
> >
> > From a language POV, no, but inside a specific interpreter such as
> > CPython it may be a reasonable expectation.
> >
> >> I'm assuming that it's either this, or some refcount bug in the interpreter
> >> that manifests only when certain strings are no longer interned and thus
> >> have a higher chance to get low refcount values.
> >
> > Indeed, if it's a real bug it would be nice to get it fixed :-)
> 
> By the way, in that case, what would be the best way to debug such
> type of ref count errors? I recently ran across this document [1],
> which kind of applies to debugging focused on newly introduced code.

That documents looks a bit outdated (1998!).
I would suggest you enable core dumps (`ulimit -c unlimited`), then let
Python crash and inspect the stack trace with gdb.
You will get better results if using a debug build and the modern gdb
inspection helpers:
http://docs.python.org/devguide/gdb.html

Oh, by the way, it would be better to do your work on Python 3 rather
than 2.7. Either the `default` branch or the `3.3` branch, I guess.
See http://docs.python.org/devguide/setup.html#checkout

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Language Summit at PyCon: Agenda

2013-03-02 Thread Antoine Pitrou
On Thu, 28 Feb 2013 11:39:52 +
Michael Foord  wrote:
> > 
> > Perhaps someone wants to discuss
> > http://www.python.org/dev/peps/pep-0428/, but I won't be there and the
> > PEP isn't terribly up-to-date either :-)
> 
> If you can find someone familiar with pathlib to champion the discussion it 
> is more likely to happen and be productive... Getting the PEP up to date 
> before the summit will also help. (I very much like the *idea* of pathlib and 
> the bits I've seen / read through - but I haven't used it in anger yet so I 
> don't feel qualified to champion it myself.)

I've made the PEP up-to-date now.
http://mail.python.org/pipermail/python-ideas/2013-March/019731.html

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Planning on removing cache invalidation for file finders

2013-03-02 Thread Erik Bray
On Sat, Mar 2, 2013 at 10:36 AM, Nick Coghlan  wrote:
> In addition, it may be appropriate for importlib to offer a
> "write_module" method that accepts (module name, target path,
> contents). This would:
>
> 1. Allow in-process caches to be invalidated implicitly and
> selectively when new modules are created
> 2. Allow importers to abstract write access in addition to read access
> 3. Allow the import system to complain at time of writing if the
> desired module name and target path don't actually match given the
> current import system state.

+1 to write_module().  This would be useful in general, I think.
Though perhaps the best solution to the original problem is to more
forcefully document: "If you're writing a module and expect to be able
to import it immediately within the same process, it's necessary to
manually invalidate the directory cache."

I might go a little further and suggest adding a function to only
invalidate the cache for the relevant directory (the proposed
write_module() function could do this).  This can already be done with
something like:

dirname = os.path.dirname(module_filename)
sys.path_importer_cache[dirname].invalidate_caches()

But that's a bit onerous considering that this wasn't even necessary
before 3.3.  There should be an easier way to do this, as there's no
sense in invalidating all the directory caches if one is only writing
new modules to a specific directory or directories.

Erik
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Planning on removing cache invalidation for file finders

2013-03-02 Thread Antoine Pitrou
On Sat, 2 Mar 2013 11:16:28 -0500
Brett Cannon  wrote:
> > In addition, it may be appropriate for importlib to offer a
> > "write_module" method that accepts (module name, target path,
> > contents). This would:
> >
> > 1. Allow in-process caches to be invalidated implicitly and
> > selectively when new modules are created
> 
> I don't think that's necessary. If people don't want to blindly clear all
> caches for a file they can write the file, search the keys in
> sys.path_importer_cache for the longest prefix for the newly created file,
> and then call the invalidate_cache() method on that explicit finder.

That's too complicated for non-import experts IMHO.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Language Summit at PyCon: Agenda

2013-03-02 Thread Trent Nelson
On Wed, Feb 27, 2013 at 08:51:16AM -0800, Michael Foord wrote:
> If you have other items you'd like to discuss please let me know and I
> can add them to the agenda.

Hmm, seems like this might be a good forum to introduce the
parallel/async stuff I've been working on the past few months.
TL;DR version is I've come up with an alternative approach for
exploiting multiple cores that doesn't rely on GIL-removal or
STM (and has a negligible performance overhead when executing
single-threaded code).  (For those that are curious, it lives
in the px branch of the sandbox/trent repo on hg.p.o, albeit
in a very experimental/prototype/proof-of-concept state (i.e.
it's an unorganized, undocumented, uncommented hackfest); on
the plus side, it works.  Sort of.)

Second suggestion: perhaps a little segment on Snakebite?  What
it is, what's available to committers, feedback/kvetching from
those who have already used it, etc.

(I forgot the format of these summits -- is there a projector?)

Trent.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com