Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Eli Bendersky
On Tue, Aug 30, 2011 at 08:57, Greg Ewing wrote:

> Nick Coghlan wrote:
>
>  Personally, I *like* CPython fitting into the "simple-and-portable"
>> niche in the Python interpreter space.
>>
>
> Me, too! I like that I can read the CPython source and
> understand what it's doing most of the time. Please don't
> screw that up by attempting to perform heroic optimisations.
>
> --
>

Following this argument to the extreme, the bytecode evaluation code of
CPython can be simplified quite a bit. Lose 2x performance but gain a lot of
readability. Does that sound like a good deal? I don't intend to sound
sarcastic, just show that IMHO this argument isn't a good one. I think that
even clever optimized code can be properly written and *documented* to make
the task of understanding it feasible. Personally, I'd love CPython to be a
bit faster and see no reason to give up optimization opportunities for the
sake of code readability.

Eli
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Martin v. Löwis
> I don't compare ASCII and ISO-8859-1 decoders. I was asking if decoding 
> b'abc' 
> from ISO-8859-1 is faster than decoding b'ab\xff' from ISO-8859-1, and if 
> yes: 
> why?

No, that makes no difference.

> 
> Your patch replaces PyUnicode_New(size, 255) ...  memcpy(), by 
> PyUnicode_FromUCS1().

You compared to the wrong revision. PyUnicode_New is already a PEP 393
function, and this version you have been comparing to is indeed faster
than the current version. However, it is also incorrect, as it fails
to compute the maxchar, and hence fails to detect pure-ASCII strings.

See below for the actual diff. It should be obvious why the 393 version
is faster: 3.3 currently needs to widen each char (to 16 or 32 bits).

Regards,
Martin

@@ -5569,41 +5569,8 @@
   Py_ssize_t size,
   const char *errors)
 {
-PyUnicodeObject *v;
-Py_UNICODE *p;
-const char *e, *unrolled_end;
-
 /* Latin-1 is equivalent to the first 256 ordinals in Unicode. */
-if (size == 1) {
-Py_UNICODE r = *(unsigned char*)s;
-return PyUnicode_FromUnicode(&r, 1);
-}
-
-v = _PyUnicode_New(size);
-if (v == NULL)
-goto onError;
-if (size == 0)
-return (PyObject *)v;
-p = PyUnicode_AS_UNICODE(v);
-e = s + size;
-/* Unrolling the copy makes it much faster by reducing the looping
-   overhead. This is similar to what many memcpy() implementations
do. */
-unrolled_end = e - 4;
-while (s < unrolled_end) {
-p[0] = (unsigned char) s[0];
-p[1] = (unsigned char) s[1];
-p[2] = (unsigned char) s[2];
-p[3] = (unsigned char) s[3];
-s += 4;
-p += 4;
-}
-while (s < e)
-*p++ = (unsigned char) *s++;
-return (PyObject *)v;
-
-  onError:
-Py_XDECREF(v);
-return NULL;
+return PyUnicode_FromUCS1((unsigned char*)s, size);
 }

 /* create or adjust a UnicodeEncodeError */
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Greg Ewing

Nick Coghlan wrote:


Personally, I *like* CPython fitting into the "simple-and-portable"
niche in the Python interpreter space.


Me, too! I like that I can read the CPython source and
understand what it's doing most of the time. Please don't
screw that up by attempting to perform heroic optimisations.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)

2011-08-29 Thread Greg Ewing

Guido van Rossum wrote:


On Mon, Aug 29, 2011 at 2:17 PM, Greg Ewing  wrote:



All you
need to do when writing the .pyx file is follow the same
API that you would if you were writing C code to use the
library.


Interesting. Then how does Pyrex/Cython typecheck your code at compile time?


You might be reading more into that statement than I meant.
You have to supply Pyrex/Cython versions of the C declarations,
either hand-written or generated by a tool. But you write them
based on the advertised C API -- you don't have to manually
expand macros, work out the low-level layout of structs, or
anything like that (as you often have to do when using ctypes).

--
Greg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Nick Coghlan
On Tue, Aug 30, 2011 at 12:38 PM, Gregory P. Smith  wrote:
> Some in this thread seemed to give the impression that CPython performance
> is not something to care about. I disagree. I see CPython being the main
> implementation of Python used in most places for a long time. Improving its
> performance merely raises the bar to be met by other implementations if they
> want to compete. That is a good thing!

Not the impression I intended to give. I merely want to highlight that
we need to be careful that incremental increases in complexity are
justified with real, measured performance improvements. PyPy has set
the bar on how to do that - people that seriously want to make CPython
faster need to focus on getting speed.python.org sorted *first* (so we
know where we're starting) and *then* work on trying to improve
CPython's numbers relative to that starting point.

The PSF has the hardware to run the site, but, unless more has been
going in the background than I am aware of, is still lacking trusted
volunteers to do the following:
1. Getting codespeed up and running on the PSF hardware
2. Hooking it in to the CPython source control infrastructure
3. Getting a reasonable set of benchmarks running on 3.x (likely
starting with the already ported set in Mercurial, but eventually we
want the full suite that PyPy uses)
4. Once PyPy, Jython and IronPython offer 3.x compatible versions,
start including them as well (alternatively, offer 2.x performance
comparisons as well, although that's less interesting from a CPython
point of view since it can't be used to guide future CPython
optimisation efforts)

Anecdotal, non-reproducible performance figures are *not* the way to
go about serious optimisation efforts. Using a dedicated machine is
vulnerable to architecture-specific idiosyncracies, but ad hoc testing
on other systems can still be used as a sanity check.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Gregory P. Smith
On Mon, Aug 29, 2011 at 2:05 PM, stefan brunthaler wrote:

> > The question really is whether this is an all-or-nothing deal. If you
> > could identify smaller parts that can be applied independently, interest
> > would be higher.
> >
> Well, it's not an all-or-nothing deal. In my current architecture, I
> can selectively enable most of the optimizations as I see fit. The
> only pre-requisite (in my implementation) is that I have two dispatch
> loops with a changed instruction format. It is, however, not a
> technical necessity, just the way I implemented it. Basically, you can
> choose whatever you like best, and I could extract that part. I am
> just offering to add all the things that I have done :)
>
>
+1 from me on going forward with your performance improvements.  The more
you can break them down into individual smaller patch sets the better as
they can be reviewed and applied as needed.  A prerequisites patch, a patch
for the wide opcodes, etc..

For benchmarks given this is python 3, just get as many useful ones running
as you can.

Some in this thread seemed to give the impression that CPython performance
is not something to care about. I disagree. I see CPython being the main
implementation of Python used in most places for a long time. Improving its
performance merely raises the bar to be met by other implementations if they
want to compete. That is a good thing!

-gps


> > Also, I'd be curious whether your techniques help or hinder a potential
> > integration of a JIT generator.
> >
> This is something I have previously frequently discussed with several
> JIT people. IMHO, having my optimizations in-place also helps a JIT
> compiler, since it can re-use the information I gathered to generate
> more aggressively optimized native machine code right away (the inline
> caches can be generated with the type information right away, some
> functions could be inlined with the guard statements subsumed, etc.)
> Another benefit could be that the JIT compiler can spend longer time
> on generating code, because the interpreter is already faster (so in
> some cases it would probably not make sense to include a
> non-optimizing fast and simple JIT compiler).
> There are others on the list, who probably can/want to comment on this,
> too.
>
> That aside, I think that while having a JIT is an important goal, I
> can very well imagine scenarios where the additional memory
> consumption (for the generated native machine code) of a JIT for each
> process (I assume that the native machine code caches are not shared)
> hinders scalability. I have in fact no data to back this up, but I
> think that would be an interesting trade off, say if I have 30% gain
> in performance without substantial additional memory requirements on
> my existing hardware, compared to higher achievable speedups that
> require more machines, though.
>
>
> Regards,
> --stefan
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Antoine Pitrou
On Tue, 30 Aug 2011 10:00:28 +1000
Nick Coghlan  wrote:
> >
> > Having a word-sized "bytecode" format would probably be acceptable in
> > itself, so if you want to submit a patch for that, go ahead.
> 
> Although any such patch should discuss how it compares with Cesare's
> work on wpython.
> Personally, I *like* CPython fitting into the "simple-and-portable"
> niche in the Python interpreter space.

Changing the bytecode width wouldn't make the interpreter more complex.

> Armin Rigo made the judgment
> years ago that CPython was a poor platform for serious optimisation
> when he stopped working on Psyco and started PyPy instead, and I think
> the contrasting fates of PyPy and Unladen Swallow have borne out that
> opinion.

Well, PyPy didn't show any significant achievements before they spent
*much* more time on it than the Unladen Swallow guys did. Whether or not
a good JIT is possible on top of CPython might remain a largely
unanswered question.

> Significantly increasing the complexity of CPython for
> speed-ups that are dwarfed by those available through PyPy seems like
> a poor trade-off to me.

Some years ago we were waiting for Unladen Swallow to improve itself
and be ported to Python 3. Now it seems we are waiting for PyPy to be
ported to Python 3. I'm not sure how "let's just wait" is a good
trade-off if someone proposes interesting patches (which, of course,
remains to be seen).

> At a bare minimum, I don't think any significant changes should be
> made under the "it will be faster" justification until the bulk of the
> real-world benchmark suite used for speed.pypy.org is available for
> Python 3. (Wasn't there a GSoC project about that?)

I'm not sure what the bulk is, but have you already taken a look at
http://hg.python.org/benchmarks/ ?

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 6721 "Locks in python standard library should be sanitized on fork"

2011-08-29 Thread Terry Reedy

On 8/29/2011 3:41 PM, Nir Aides wrote:


I am not familiar with the python-dev definition for deprecation, but


Possible to planned eventual removal


when I used the word in the bug discussion I meant to advertize to
users that they should not mix threading and forking since that mix is
and will remain broken by design; I did not mean removal or crippling
of functionality.


This would be a note or warning in the doc. You can suggest what and 
where to add something on an existing issue or a new one.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread stefan brunthaler
> Personally, I *like* CPython fitting into the "simple-and-portable"
> niche in the Python interpreter space. Armin Rigo made the judgment
> years ago that CPython was a poor platform for serious optimisation
> when he stopped working on Psyco and started PyPy instead, and I think
> the contrasting fates of PyPy and Unladen Swallow have borne out that
> opinion. Significantly increasing the complexity of CPython for
> speed-ups that are dwarfed by those available through PyPy seems like
> a poor trade-off to me.
>
I agree with the trade-off, but the nice thing is that CPython's
interpreter remains simple and portable using my optimizations. All of
these optimizations are purely interpretative and the complexity of
CPython is not affected much. (For example, I have an inline-cached
version of BINARY_ADD that is called INCA_FLOAT_ADD [INCA being my
abbreviation for INline CAching]; you don't actually have to look at
its source code, since it is generated by my code generator but can by
looking at instruction traces immediately tell what's going on.) So,
the interpreter remains fully portable and any compatibility issues
with C modules should not occur either.


> At a bare minimum, I don't think any significant changes should be
> made under the "it will be faster" justification until the bulk of the
> real-world benchmark suite used for speed.pypy.org is available for
> Python 3. (Wasn't there a GSoC project about that?)
>
Having more tests would surely be helpful, as already said, the most
real-world stuff I can do is Martin's django patch (some of the other
benchmarks though are from the shootout and I can [and did] run them,
too {binarytrees, fannkuch, fasta, mandelbrot, nbody and
spectralnorm}. I have also the AI benchmark from Unladden Swallow but
no current figures.)


Best,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)

2011-08-29 Thread Guido van Rossum
On Mon, Aug 29, 2011 at 2:17 PM, Greg Ewing  wrote:
> Guido van Rossum wrote:
>>
>> (Just like Python's own .h files --
>> e.g. the extensive renaming of the Unicode APIs depending on
>> narrow/wide build) How does Cython deal with these?
>
> Pyrex/Cython deal with it by generating C code that includes
> the relevant headers, so the C compiler expands all the
> macros, interprets the struct declarations, etc. All you
> need to do when writing the .pyx file is follow the same
> API that you would if you were writing C code to use the
> library.

Interesting. Then how does Pyrex/Cython typecheck your code at compile time?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Nick Coghlan
On Tue, Aug 30, 2011 at 7:14 AM, Antoine Pitrou  wrote:
> On Mon, 29 Aug 2011 11:33:14 -0700
> stefan brunthaler  wrote:
>> * The optimized dispatch routine has a changed instruction format
>> (word-sized instead of bytecodes) that allows for regular instruction
>> decoding (without the HAS_ARG-check) and inlinind of some objects in
>> the instruction format on 64bit architectures.
>
> Having a word-sized "bytecode" format would probably be acceptable in
> itself, so if you want to submit a patch for that, go ahead.

Although any such patch should discuss how it compares with Cesare's
work on wpython.

Personally, I *like* CPython fitting into the "simple-and-portable"
niche in the Python interpreter space. Armin Rigo made the judgment
years ago that CPython was a poor platform for serious optimisation
when he stopped working on Psyco and started PyPy instead, and I think
the contrasting fates of PyPy and Unladen Swallow have borne out that
opinion. Significantly increasing the complexity of CPython for
speed-ups that are dwarfed by those available through PyPy seems like
a poor trade-off to me.

At a bare minimum, I don't think any significant changes should be
made under the "it will be faster" justification until the bulk of the
real-world benchmark suite used for speed.pypy.org is available for
Python 3. (Wasn't there a GSoC project about that?)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ctypes and the stdlib

2011-08-29 Thread Guido van Rossum
On Mon, Aug 29, 2011 at 2:39 AM, Stefan Behnel  wrote:
> Guido van Rossum, 29.08.2011 04:27:
>> Hm, the main use that was proposed here for ctypes is to wrap existing
>> libraries (not to create nicer APIs, that can be done in pure Python
>> on top of this).
>
> The same applies to Cython, obviously. The main advantage of Cython over
> ctypes for this is that the Python-level wrapper code is also compiled into
> C, so whenever the need for a thicker wrapper arises in some part of the
> API, you don't loose any performance in intermediate layers.

Yes, this is a very nice advantage. The only advantage that I can
think of for ctypes is that it doesn't require a toolchain -- you can
just write the Python code and get going. With Cython you will always
have to invoke the Cython compiler. Another advantage may be that it
works *today* for PyPy -- I don't know the status of Cython for PyPy.

Also, (maybe this was answered before?), how well does Cython deal
with #include files (especially those you don't have control over,
like the ones typically required to use some lib.so safely on all
platforms)?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3151 from the BDFOP

2011-08-29 Thread Nick Coghlan
On Tue, Aug 30, 2011 at 7:18 AM, Barry Warsaw  wrote:
> Okay, so here's what's still outstanding for me:
>
> * Should we eliminate FileSystemError? (probably "yes")

I've also been persuaded that this isn't a generally meaningful
categorisation, so +1 for dropping it. ConnectionError is worth
keeping, though.

> * Should we ensure one errno == one exception?
>  - i.e. separate EACCES and EPERM
>  - i.e. separate EPIPE and ESHUTDOWN

I think the concept of a 1:1 mapping is a complete non-starter, since
"OSError" is always going to map to multiple errnos (i.e. everything
that hasn't been assigned to a specific subclass). Maintaining the
class categorisation down to a certain level for ease of separate
handling is worthwhile, but below that point it's better to let people
realise that they need to understand the subtleties of the different
errno values.

> * Should the str of the new exception subclasses be improved (e.g. to include
>  the symbolic name instead of the errno first)?

I'd say that's a distinct RFE on the tracker (since it applies
regardless of the acceptance or rejection of PEP 3151). Good idea in
principle, though.

> * Is the OSError.__new__() hackery a good idea?

I agree it's a little magical, but I also think the PEP becomes pretty
useless without it. If OSError.__new__ handles the mapping, then most
code (including C code) doesn't need to change - it will raise the new
subclasses automatically. If we demand that all exception *raising*
code be changed, then exception *catching* code will have a hard time
assuming that the new subclasses are going to be raised correctly
instead of a top level OSError.

To make that transition feasible, I think we *need* to make it as hard
as we can (if not impossible) to raise OSError instances with defined
errno values that *don't* conform to the new hierarchy so that 3.3+
exception catching code doesn't need to worry about things like ENOENT
being raised as OSError instead of FileNotFoundError. Only code that
also supports earlier versions should need to resort to inspecting the
errno values for the coarse distinctions that the PEP provides via the
new class hierarchy.

> * Should the PEP define the signature of the new exceptions (e.g. to prohibit
>  passing in an incorrect errno to an OSError subclass)?

Unfortunately, I think the variations in errno details across
platforms mean that being too restrictive in this space would cause
more problems than it solves.

So it may be wiser to technically allow people to do silly things like
"raise FileNotFoundError(errno.EPIPE)" with the admonition not to
actually do that because it is obscure and confusing. "Consenting
adults", etc.

> * Can we add ECHILD and ESRCH, and if so, what names should we use?

+1 for ChildProcessError and ProcessLookupError (as peer exceptions on
the tier directly below OSError)

> * Where can we capture the idea of putting the symbolic names on OSError class
>  attributes, or is it a dumb idea that should be ditched?

"Tracker RFE" for the former and "maybe" for the latter. With this
PEP, the need for direct inspection of errno values should be
significantly reduced in most code, so importing errno shouldn't be
necessary.

> * How long should we wait for other Python implementations to chime in?

"Until Antoine gets back from his holiday" sounds reasonable to me.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)

2011-08-29 Thread Meador Inge
On Sat, Aug 27, 2011 at 11:58 PM, Terry Reedy  wrote:

> Dan, I once had the more or less the same opinion/question as you with
> regard to ctypes, but I now see at least 3 problems.
>
> 1) It seems hard to write it correctly. There are currently 47 open ctypes
> issues, with 9 being feature requests, leaving 38 behavior-related issues.
> Tom Heller has not been able to work on it since the beginning of 2010 and
> has formally withdrawn as maintainer. No one else that I know of has taken
> his place.

I am trying to work through getting these issues resolved.  The hard part so
far has been getting reviews and commits.  The follow patches are awaiting
review (the patch for issue 11241 has been accepted, just not applied):

1. http://bugs.python.org/issue9041
2. http://bugs.python.org/issue9651
3. http://bugs.python.org/issue11241

I am more than happy to keep working through these issues, but I need some
help getting the patches actually applied since I don't have commit rights.

-- 
# Meador
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread stefan brunthaler
> Does it speed up Python? :-) Could you provide numbers (benchmarks)?
>
Yes, it does ;)

The maximum overall speedup I achieved was by a factor of 2.42 on my
i7-920 for the spectralnorm benchmark of the computer language
benchmark game.

Others from the same set are:
  binarytrees: 1.9257 (1.9891)
  fannkuch: 1.6509 (1.7264)
  fasta: 1.5446 (1.7161)
  mandelbrot: 2.0040 (2.1847)
  nbody: 1.6165 (1.7602)
  spectralnorm: 2.2538 (2.4176)
  ---
  overall: 1.8213 (1.9382)

(The first number is the combination of all optimizations, the one in
parentheses is with my last optimization [Interpreter Instruction
Scheduling] enabled, too.)

For a comparative real world benchmark I tested Martin von Loewis'
django port (there are not that many meaningful Python 3 real world
benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably
well, US got a speedup of 1.35 on this benchmark. I just checked that
pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures
seem to be not working currently or *really* fast...), but I cannot
tell directly how that relates to speedups (it just says "less is
better" and I did not quickly find an explanation).
Since I did this benchmark last year, I have spent more time
investigating this benchmark and found that I could do better, but I
would have to guess as to how much (An interesting aside though: on
this benchmark, the executable never grew on more than 5 megs of
memory usage, exactly like the vanilla Python 3 interpreter.)

hth,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Victor Stinner
Le lundi 29 août 2011 21:34:48, vous avez écrit :
> >> Those haven't been ported to the new API, yet. Consider, for example,
> >> d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test;
> >> with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this
> >> is a 25% speedup for PEP 393.
> > 
> > If I understand correctly, the performance now highly depend on the used
> > characters? A pure ASCII string is faster than a string with characters
> > in the ISO-8859-1 charset?
> 
> How did you infer that from above paragraph??? ASCII and Latin-1 are
> mostly identical in terms of performance - the ASCII decoder should be
> slightly slower than the Latin-1 decoder, since the ASCII decoder needs
> to check for errors, whereas the Latin-1 decoder will never be
> confronted with errors.

I don't compare ASCII and ISO-8859-1 decoders. I was asking if decoding b'abc' 
from ISO-8859-1 is faster than decoding b'ab\xff' from ISO-8859-1, and if yes: 
why?

Your patch replaces PyUnicode_New(size, 255) ...  memcpy(), by 
PyUnicode_FromUCS1(). I don't understand how it makes Python faster: 
PyUnicode_FromUCS1() does first scan the input string for the maximum code 
point.

I suppose that the main difference is that the ISO-8859-1 encoded string is 
stored as the UTF-8 encoded string (shared pointer) if all characters of the 
string are ASCII characters. In this case, encoding the string to UTF-8 
doesn't cost anything, we already have the result.

Am I correct?

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Victor Stinner
Le lundi 29 août 2011 19:35:14, stefan brunthaler a écrit :
> pretty much a year ago I wrote about the optimizations I did for my
> PhD thesis that target the Python 3 series interpreters

Does it speed up Python? :-) Could you provide numbers (benchmarks)?

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3151 from the BDFOP

2011-08-29 Thread Antoine Pitrou
On Mon, 29 Aug 2011 17:18:33 -0400
Barry Warsaw  wrote:
> On Aug 24, 2011, at 01:57 AM, Antoine Pitrou wrote:
> 
> >> One guiding principle for me is that we should keep the abstraction as thin
> >> as possible.  In particular, I'm concerned about mapping multiple errnos
> >> into a single Error.  For example both EPIPE and ESHUTDOWN mapping to
> >> BrokePipeError, or EACESS or EPERM to PermissionError.  I think we should
> >> resist this, so that one errno maps to exactly one Error.  Where grouping
> >> is desired, Python already has mechanisms to deal with that,
> >> e.g. superclasses and multiple inheritance.  Therefore, I think it would be
> >> better to have
> >> 
> >> + FileSystemPermissionError
> >>   + AccessError (EACCES)
> >>   + PermissionError (EPERM)
> >
> >I'm not sure that's a good idea:
> 
> Was it the specific grouping under FileSystemPermissionError that you're
> objecting to, or the "keep the abstraction thin" principle?

The former. EPERM is generally returned for things which aren't
filesystem-related.
(although I also think separating EACCES and EPERM is of little value
*in practice*)

>  Let's say we
> threw out the idea of FSPE superclass, would you still want to collapse EACCES
> and EPERM into PermissionError, or would separate exceptions for each be okay?

I have a preference for the former, but am not against the latter. I
just think that, given AccessError and PermissionError, most users
won't know up front which one they should care about.

> It's still pretty easy to catch both in one except clause, and it won't be too
> annoying if it's rare.

Indeed.

> Reading your IRC message (sorry, I was afk) it sounds like you think
> FileSystemError can be removed.  I like keeping the hierarchy flat.

Ok. It can be reintroduced later on.
(the main reason why I think it can be removed is that EACCES in itself
is often tied to filesystem access rights; so the EACCES exception
class would have to be a subclass of FileSystemError, while the EPERM
one should not :-))

>  open("foo")
> >Traceback (most recent call last):
> >  File "", line 1, in 
> >FileNotFoundError: [Errno 2] No such file or directory: 'foo'
> >
> >(see e.g. http://bugs.python.org/issue12762)
> 
> True, but since you're going to be creating a bunch of new exception classes,
> it should be relatively painless to give them a better str.  Thanks for
> pointing out that bug; I agree with it.

Well, the str right now is exactly the same as OSError's.

> My question mostly was about raising OSError (as the current PEP states) with
> an errno that does *not* map to one of the new exceptions.  In that case, I
> don't think there's anything you could raise other than exactly OSError,
> right?

And indeed, that's what the implementation does :)

> So, for raising OSError with an errno mapping to one of the subclasses, it
> appears to break the "explicit is better than implicit" principle, and I think
> it could lead to hard-to-debug or understand code.  You'll look at code that
> raises OSError, but the exception that gets printed will be one of the
> subclasses.  I'm afraid that if you don't know that this is happening, you're
> going to think you're going crazy.

Except that it only happens if you use a recognized errno. For example
if you do:

>>> OSError(errno.ENOENT, "not found")
FileNotFoundError(2, 'not found')

Not if you just pass a message (or anything else, actually):

>>> OSError("some message")
OSError('some message',)

But if you pass an explicit errno, then the subclass doesn't appear
that surprising, does it?

> The other half is, let's say raising FileNotFoundError with the EEXIST errno.
> I'm guessing that the __init__'s for the new OSError subclasses will not have
> an `errno` attribute, so there's no way you can do that, but the PEP does not
> discuss this. 

Actually, the __new__ and the __init__ are exactly the same as
OSError's:

>>> e = FileNotFoundError("some message")
>>> e.errno
>>> e = FileNotFoundError(errno.ENOENT, "some message")
>>> e.errno
2

> >Wow, I didn't know ESRCH.
> >How would you call the respective exceptions?
> >- ChildProcessError for ECHILD?
>
[...]
> 
> >- ProcessLookupError for ESRCH?
> 
[...]
> 
> So in a sense, both are lookup errors, though I think it's going too far to
> multiply inherit from LookupError.  Maybe ChildWaitError or ChildLookupError
> for the former?  ProcessLookupError seems good to me.

Ok.

> >> What if all the errno symbolic names were mapped as attributes on IOError?
> >> The only advantage of that would be to eliminate the need to import errno,
> >> or for the ugly `e.errno == errno.ENOENT` stuff.  That would then be
> >> rewritten as `e.errno == IOError.ENOENT`.  A mild savings to be sure, but
> >> still.
> >
> >Hmm, I guess that's explorable as an orthogonal idea.
> 
> Cool.  How should we capture that?

A separate PEP perhaps, or more appropriately (IMHO) a tracker entry,
since it's just about enriching the attributes of an existing type.

Re: [Python-Dev] SWIG (was Re: Ctypes and the stdlib)

2011-08-29 Thread Guido van Rossum
Thanks for an insightful post, Dave! I took the liberty of mentioning
it on Google+:

https://plus.google.com/115212051037621986145/posts/NyEiLEfR6HF

(PS. Anyone wanting a G+ invite, go here:
https://plus.google.com/i/7w3niYersIA:8fxDrfW-6TA )

--Guido

On Mon, Aug 29, 2011 at 5:41 AM, David Beazley  wrote:
> On Mon, Aug 29, 2011 at 12:27 PM, Guido van Rossum  wrote:
>
>> I wonder if for
>> this particular purpose SWIG isn't the better match. (If SWIG weren't
>> universally hated, even by its original author. :-)
>
> Hate is probably a strong word, but as the author of Swig, let me chime in 
> here ;-).   I think there are probably some lessons to be learned from Swig.
>
> As Nick noted, Swig is best suited when you have control over both sides 
> (C/C++ and Python) of whatever code you're working with.  In fact, the 
> original motivation for  Swig was to give application programmers (scientists 
> in my case), a means for automatically generating the Python bindings to 
> their code.  However, there was one other important assumption--and that was 
> the fact that all of your "real code" was going to be written in C/C++ and 
> that the Python scripting interface was just an optional add-on (perhaps even 
> just a throw-away thing).  Keep in mind, Swig was first created in 1995 and 
> at that time, the use of Python (or any similar language) was a pretty 
> radical idea in the sciences.  Moreover, there was a lot of legacy code that 
> people just weren't going to abandon.  Thus, I always viewed Swig as a kind 
> of transitional vehicle for getting people to use Python who might otherwise 
> not even consider it.   Getting back to Nick's point though, to really use 
> Swig effectiv
>  ely, it was always known that you might have to reorganize or refactor your 
> C/C++ code to make it more Python friendly.  However, due to the automatic 
> wrapper generation, you didn't have to do it all at once.  Basically your 
> code could organically evolve and Swig would just keep up with whatever you 
> were doing.  In my projects, we'd usually just tuck Swig away in some 
> Makefile somewhere and forget about it.
>
> One of the major complexities of Swig is the fact that it attempts to parse 
> C/C++ header files.   This very notion is actually a dangerous trap waiting 
> for anyone who wants to wander into it.  You might look at a header file and 
> say, well how hard could it be to just grab a few definitions out of there?   
> I'll just write a few regexs or come up with some simple hack for recognizing 
> function definitions or something.   Yes, you can do that, but you're 
> immediately going to find that whatever approach you take starts to break 
> down into horrible corner cases.   Swig started out like this and quickly 
> turned into a quagmire of esoteric bug reports.  All sorts of problems with 
> preprocessor macros, typedefs, missing headers, and other things.  For 
> awhile, I would get these bug reports that would go something like "I had 
> this C++ class inside a namespace with an abstract method taking a typedef'd 
> const reference to this smart pointer . and Swig broke."   Hell, I can't 
> even underst
>  and the bug report let alone know how to fix it.  Almost all of these bugs 
> were due to the fact that Swig started out as a hack and didn't really have 
> any kind of solid conceptual foundation for how it should be put together.
>
> If you flash forward a bit, from about 2001-2004 there was a very serious 
> push to fix these kinds of issues.  Although it was not a complete rewrite of 
> Swig, there were a huge number of changes to how it worked during this time.  
> Swig grew a fully compatible C++ preprocessor that fully supported macros   A 
> complete C++ type system was implemented including support for namespaces, 
> templates, and even such things as template partial specialization.  Swig 
> evolved into a multi-pass compiler that was doing all sorts of global 
> analysis of the interface.   Just to give you an idea, Swig would do things 
> such as automatically detect/wrap C++ smart pointers.  It could wrap 
> overloaded C++ methods/function.  Also, if you had a C++ class with virtual 
> methods, it would only make one Python wrapper function and then reuse across 
> all wrapped subclasses.
>
> Under the covers of all of this, the implementation basically evolved into a 
> sophisticated macro preprocessor coupled with a pattern matching engine built 
> on top of the C++ type system.   For example, you could write patterns that 
> matched specific C++ types (the much hated "typemap" feature) and you could 
> write patterns that matched entire C++ declarations.  This whole pattern 
> matching approach had a huge power if you knew what you were doing.  For 
> example, I had a graduate student working on adding "contracts" to 
> Swig--something that was being funded by a NSF grant.   It was cool and mind 
> boggling all at once.
>
> In hindsight however, I think the complexity of Sw

Re: [Python-Dev] PEP 3151 from the BDFOP

2011-08-29 Thread Barry Warsaw
On Aug 24, 2011, at 01:57 AM, Antoine Pitrou wrote:

>> One guiding principle for me is that we should keep the abstraction as thin
>> as possible.  In particular, I'm concerned about mapping multiple errnos
>> into a single Error.  For example both EPIPE and ESHUTDOWN mapping to
>> BrokePipeError, or EACESS or EPERM to PermissionError.  I think we should
>> resist this, so that one errno maps to exactly one Error.  Where grouping
>> is desired, Python already has mechanisms to deal with that,
>> e.g. superclasses and multiple inheritance.  Therefore, I think it would be
>> better to have
>> 
>> + FileSystemPermissionError
>>   + AccessError (EACCES)
>>   + PermissionError (EPERM)
>
>I'm not sure that's a good idea:

Was it the specific grouping under FileSystemPermissionError that you're
objecting to, or the "keep the abstraction thin" principle?  Let's say we
threw out the idea of FSPE superclass, would you still want to collapse EACCES
and EPERM into PermissionError, or would separate exceptions for each be okay?
It's still pretty easy to catch both in one except clause, and it won't be too
annoying if it's rare.

>Yes, FileSystemError might be removed. I thought that it would be
>useful, in some library routines, to catch all filesystem-related
>errors indistinctly, but it's not a complete catchall actually (for
>example, AccessError is outside of the FileSystemError subtree).

Reading your IRC message (sorry, I was afk) it sounds like you think
FileSystemError can be removed.  I like keeping the hierarchy flat.

>> Similarly, I think it would be helpful to have the errno name (e.g. ENOENT)
>> in the error message string.  That way, it won't get in the way for most
>> code, but would be usefully printed out for uncaught exceptions.
>
>Agreed, but I think that's a feature request quite orthogonal from the
>PEP. The errno *number* is still printed as it was before:
>
 open("foo")
>Traceback (most recent call last):
>  File "", line 1, in 
>FileNotFoundError: [Errno 2] No such file or directory: 'foo'
>
>(see e.g. http://bugs.python.org/issue12762)

True, but since you're going to be creating a bunch of new exception classes,
it should be relatively painless to give them a better str.  Thanks for
pointing out that bug; I agree with it.

>> A second guiding principle should be that careful code that works in Python
>> 3.2 must continue to work in Python 3.3 once PEP 3151 is accepted, but also
>> for Python 2 code ported straight to Python 3.3.
>
>I don't porting straight to 3.3 would make a difference, especially now
>that the idea of deprecating old exception names has been abandoned.

Cool.

>> Do be prepared for complaints about compatibility for careless code though
>> - there's a ton of that out in the wild, and people will always complain
>> with their "working" code breaks due to an upgrade.  Be *very* explicit
>> about this in the release notes and NEWS file, and put your asbestos
>> underoos on.
>
>I'll take care about that :)

:)

>> Have you considered the impact of this PEP on other Python implementations?
>> My hazy memory of Jython tells me that errnos don't really leak into Java
>> and thus Jython much, but what about PyPy and IronPython?  E.g. step 1's
>> deprecation strategy seems pretty CPython-centric.
>
>Alternative implementations already have to implement errno codes in a
>way or another if they want to have a chance of running existing code.
>So I don't think the PEP makes much of a difference for them.
>But their implementors can give their opinion on this.

Let's give them a little more time to chime in (hopefully, they are reading
this thread).  We needn't wait too long though.

>> As for step 1 (coalescing the errors).  This makes sense and I'm generally
>> agreeable, but I'm wondering whether it's best to re-use IOError for this
>> rather than introduce a new exception.  Not that I can think of a good name
>> for that.  I'm just not totally convinced that existing code when upgrading
>> to Python 3.3 won't introduce silent failures.  If an existing error is to
>> be re-used for this, I'm torn on whether IOError or OSError is a better
>> choice.  Popularity aside, OSError *feels* more right.
>
>I don't have any personal preference. Previous discussions seemed to
>indicate people preferred IOError. But changing the implementation to
>OSError would be simple. I agree OSError feels slightly more right, as
>in more generic.

Thanks for making this change in the PEP.

>> And that anything raising an exception (e.g. via PyErr_SetFromErrno) other
>> than the new ones will raise IOError?
>
>I'm not sure I understand the question precisely.

My question mostly was about raising OSError (as the current PEP states) with
an errno that does *not* map to one of the new exceptions.  In that case, I
don't think there's anything you could raise other than exactly OSError,
right?

>The errno mapping mechanism is implemented in IOError.__new__, but it gets
>called only if the class is exactly I

Re: [Python-Dev] PEP 3151 from the BDFOP

2011-08-29 Thread Barry Warsaw
On Aug 24, 2011, at 12:51 PM, Nick Coghlan wrote:

>On Wed, Aug 24, 2011 at 9:57 AM, Antoine Pitrou  wrote:
>> Using IOError.__new__ is the easiest way to ensure that all code
>> raising IO errors takes advantage of the errno mapping. Otherwise you
>> may get APIs raising the proper subclasses, and other APIs always
>> raising base IOError (it doesn't happen often, but some Python
>> library code raises an IOError with an explicit errno).
>
>It's also the natural place to put the errno->exception type mapping
>so that existing code will raise the new errors without requiring
>modification. We could spell it as a new class method ("from_errno" or
>similar), but there isn't any ambiguity in doing it directly in
>__new__, so a class method seems pointlessly inconvenient.

As I mentioned, my main concern with this is the surprise factor for people
debugging and reading the code.  A class method would solve that, but looks
uglier and doesn't work with existing code.

-Barry

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)

2011-08-29 Thread Greg Ewing

Guido van Rossum wrote:

(Just like Python's own .h files --
e.g. the extensive renaming of the Unicode APIs depending on
narrow/wide build) How does Cython deal with these?


Pyrex/Cython deal with it by generating C code that includes
the relevant headers, so the C compiler expands all the
macros, interprets the struct declarations, etc. All you
need to do when writing the .pyx file is follow the same
API that you would if you were writing C code to use the
library.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Antoine Pitrou
On Mon, 29 Aug 2011 11:33:14 -0700
stefan brunthaler  wrote:
> * The optimized dispatch routine has a changed instruction format
> (word-sized instead of bytecodes) that allows for regular instruction
> decoding (without the HAS_ARG-check) and inlinind of some objects in
> the instruction format on 64bit architectures.

Having a word-sized "bytecode" format would probably be acceptable in
itself, so if you want to submit a patch for that, go ahead.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread stefan brunthaler
> The question really is whether this is an all-or-nothing deal. If you
> could identify smaller parts that can be applied independently, interest
> would be higher.
>
Well, it's not an all-or-nothing deal. In my current architecture, I
can selectively enable most of the optimizations as I see fit. The
only pre-requisite (in my implementation) is that I have two dispatch
loops with a changed instruction format. It is, however, not a
technical necessity, just the way I implemented it. Basically, you can
choose whatever you like best, and I could extract that part. I am
just offering to add all the things that I have done :)


> Also, I'd be curious whether your techniques help or hinder a potential
> integration of a JIT generator.
>
This is something I have previously frequently discussed with several
JIT people. IMHO, having my optimizations in-place also helps a JIT
compiler, since it can re-use the information I gathered to generate
more aggressively optimized native machine code right away (the inline
caches can be generated with the type information right away, some
functions could be inlined with the guard statements subsumed, etc.)
Another benefit could be that the JIT compiler can spend longer time
on generating code, because the interpreter is already faster (so in
some cases it would probably not make sense to include a
non-optimizing fast and simple JIT compiler).
There are others on the list, who probably can/want to comment on this, too.

That aside, I think that while having a JIT is an important goal, I
can very well imagine scenarios where the additional memory
consumption (for the generated native machine code) of a JIT for each
process (I assume that the native machine code caches are not shared)
hinders scalability. I have in fact no data to back this up, but I
think that would be an interesting trade off, say if I have 30% gain
in performance without substantial additional memory requirements on
my existing hardware, compared to higher achievable speedups that
require more machines, though.


Regards,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Antoine Pitrou
On Mon, 29 Aug 2011 22:32:01 +0200
"Martin v. Löwis"  wrote:
> I have now written a Django application to measure the effect of PEP
> 393, using the debug mode (to find all strings), and sys.getsizeof:
> 
> https://bitbucket.org/t0rsten/pep-393/src/ad02e1b4cad9/pep393utils/djmemprof/count/views.py
> 
> The results for 3.3 and pep-393 are attached.

This looks very nice. Is 3.3 a wide build? (how about a narrow build?)

(is it with your own port of Django to py3k, or is there an official
branch for it?)

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 review

2011-08-29 Thread M.-A. Lemburg
"Martin v. Löwis" wrote:
> tl;dr: PEP-393 reduces the memory usage for strings of a very small
> Django app from 7.4MB to 4.4MB, all other objects taking about 1.9MB.
> 
> Am 26.08.2011 16:55, schrieb Guido van Rossum:
>> It would be nice if someone wrote a test to roughly verify these
>> numbers, e.v. by allocating lots of strings of a certain size and
>> measuring the process size before and after (being careful to adjust
>> for the list or other data structure required to keep those objects
>> alive).
> 
> I have now written a Django application to measure the effect of PEP
> 393, using the debug mode (to find all strings), and sys.getsizeof:
> 
> https://bitbucket.org/t0rsten/pep-393/src/ad02e1b4cad9/pep393utils/djmemprof/count/views.py
> 
> The results for 3.3 and pep-393 are attached.
> 
> The Django app is small in every respect: trivial ORM, very few
> objects (just for the sake of exercising the ORM at all),
> no templating, short strings. The memory snapshot is taken in
> the middle of a request.
> 
> The tests were run on a 64-bit Linux system with 32-bit Py_UNICODE.

For comparison, could you run the test of the unmodified
Python 3.3 on a 16-bit Py_UNICODE version as well ?

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 29 2011)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2011-10-04: PyCon DE 2011, Leipzig, Germany36 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Martin v. Löwis
> So, the two big issues aside, is there any interest in incorporating
> these optimizations in Python 3?

The question really is whether this is an all-or-nothing deal. If you
could identify smaller parts that can be applied independently, interest
would be higher.

Also, I'd be curious whether your techniques help or hinder a potential
integration of a JIT generator.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Martin v. Löwis
tl;dr: PEP-393 reduces the memory usage for strings of a very small
Django app from 7.4MB to 4.4MB, all other objects taking about 1.9MB.

Am 26.08.2011 16:55, schrieb Guido van Rossum:
> It would be nice if someone wrote a test to roughly verify these
> numbers, e.v. by allocating lots of strings of a certain size and
> measuring the process size before and after (being careful to adjust
> for the list or other data structure required to keep those objects
> alive).

I have now written a Django application to measure the effect of PEP
393, using the debug mode (to find all strings), and sys.getsizeof:

https://bitbucket.org/t0rsten/pep-393/src/ad02e1b4cad9/pep393utils/djmemprof/count/views.py

The results for 3.3 and pep-393 are attached.

The Django app is small in every respect: trivial ORM, very few
objects (just for the sake of exercising the ORM at all),
no templating, short strings. The memory snapshot is taken in
the middle of a request.

The tests were run on a 64-bit Linux system with 32-bit Py_UNICODE.

The tally of strings by length confirms that both tests have indeed
comparable sets of objects (not surprising since it is identical Django
source code and the identical application). Most strings in this
benchmark are shorter than 16 characters, and a few have several
thousand characters. The tally of byte lengths shows that it's the
really long memory blocks that are gone with the PEP.

Digging into the internal representation, it's possibly to estimate
"unaccounted" bytes. For PEP 393:

   bytes - 80*strings - (chars+strings) = 190053

This is the total of the wchar_t and UTF-8 representations for objects
that have them, plus any 2-byte and four-byte strings accounted
incorrectly in above formula. Unfortunately, for "default"

   bytes + 56*strings - 4*(chars+strings) = 0

as unicode__sizeof__ doesn't account for the (separate) PyBytes
object that may carry the default encoding. So in practice, the 3.3
number should be somewhat larger.

In both cases, the app didn't cope for internal fragmentation;
this would be possible by rounding up each string size to the next
multiple of 8 (given that it's all allocated through the object
allocator).

It should be possible to squeeze a little bit out of the 190kB,
by finding objects for which the wchar_t or UTF-8 representations
are created unnecessarily.

Regards,
Martin
3.3.0a0 (default:45b63a8a76c9, Aug 29 2011, 21:45:49) 
[GCC 4.6.1 20110526 (prerelease)]
Strings: 36075
Chars: 1303746
Bytes: 7379484
Other objects: 1906432

By Length (length: numstrings)
Up to 4: 5710
Up to 8: 8997
Up to 16: 11657
Up to 32: 4267
Up to 64: 2319
Up to 128: 1373
Up to 256: 828
Up to 512: 558
Up to 1024: 233
Up to 2048: 104
Up to 4096: 23
Up to 8192: 5
Up to 16384: 0
Up to 32768: 1

By Size (size: numstrings)
Up to 40: 0
Up to 80: 7913
Up to 160: 21796
Up to 320: 3317
Up to 640: 1452
Up to 1280: 847
Up to 2560: 482
Up to 5120: 183
Up to 10240: 65
Up to 20480: 18
Up to 40960: 1
Up to 81920: 1
3.3.0a0 (pep-393:6ffa3b569228, Aug 29 2011, 22:00:31) 
[GCC 4.6.1 20110526 (prerelease)]
Strings: 36091
Chars: 1304098
Bytes: 4417522
Other objects: 1866616

By Length (length: numstrings)
Up to 4: 5728
Up to 8: 8997
Up to 16: 11658
Up to 32: 4239
Up to 64: 2335
Up to 128: 1382
Up to 256: 828
Up to 512: 558
Up to 1024: 233
Up to 2048: 104
Up to 4096: 23
Up to 8192: 5
Up to 16384: 0
Up to 32768: 1

By Size (size: numstrings)
Up to 40: 0
Up to 80: 0
Up to 160: 33247
Up to 320: 1500
Up to 640: 1007
Up to 1280: 226
Up to 2560: 86
Up to 5120: 21
Up to 10240: 3
Up to 20480: 1
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 6721 "Locks in python standard library should be sanitized on fork"

2011-08-29 Thread Nir Aides
On Mon, Aug 29, 2011 at 8:42 PM, Jesse Noller  wrote:
> On Mon, Aug 29, 2011 at 1:22 PM, Antoine Pitrou  wrote:
>>
>> That sanitization is generally useful, though. For example if you want
>> to use any I/O after a fork().
>
> Oh! I don't disagree; I'm just against the removal of the ability to
> mix multiprocessing and threads; which it does internally and others
> do in every day code.

I am not familiar with the python-dev definition for deprecation, but
when I used the word in the bug discussion I meant to advertize to
users that they should not mix threading and forking since that mix is
and will remain broken by design; I did not mean removal or crippling
of functionality.

“When I use a word,” Humpty Dumpty said, in rather a scornful tone,
“it means just what I choose it to mean—neither more nor less.” -
Through the Looking-Glass

(btw, my tone is not scornful)

And there is no way around it - the mix in general is broken, with an
atfork mechanism or without it.
People can choose to keep doing it in their every day code at their
own risk, be it significantly high or insignificantly low.
But the documentation should explain the problem clearly.

As for the internal use of threads in the multiprocessing module I
proposed a potential way to "sanitize" those particular worker
threads:
http://bugs.python.org/issue6721#msg140402

If it makes sense and entails changes to internal multiprocessing
worker threads, those changes could be applied as bug fixes to Python
2.x and previous Python 3.x releases.

This does not contradict adding now the feature to spawn, and to make
it the only possibility in the future. I agree that this is the
"saner" approach but it is a new feature not a bug fix.

Nir
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Martin v. Löwis
>> Those haven't been ported to the new API, yet. Consider, for example,
>> d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test;
>> with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this
>> is a 25% speedup for PEP 393.
> 
> If I understand correctly, the performance now highly depend on the used
> characters? A pure ASCII string is faster than a string with characters
> in the ISO-8859-1 charset?

How did you infer that from above paragraph??? ASCII and Latin-1 are
mostly identical in terms of performance - the ASCII decoder should be
slightly slower than the Latin-1 decoder, since the ASCII decoder needs
to check for errors, whereas the Latin-1 decoder will never be
confronted with errors.

What matters is
a) is the codec already rewritten to use the new representation, or
   must it go through Py_UNICODE[] first, requiring then a second copy
   to the canonical form?
b) what is the cost of finding out the highest character? - regardless
   of what the highest character turns out to be

> Is it also true for BMP characters vs non-BMP
> characters?

Well... If you are talking about the ASCII and Latin-1 codecs - neither
of these support most BMP characters, let alone non-BMP characters.
In general, non-BMP characters are more expensive to process since they
take more space.

> Do these benchmark tools use only ASCII characters, or also some
> ISO-8859-1 characters?

See for yourself. iobench uses Latin-1, including non-ASCII, but not
non-Latin-1.

> Or, better, different Unicode ranges in different tests?

That's why I asked for a list of benchmarks to perform. I cannot
run an infinite number of benchmarks prior to adoption of the PEP.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] SWIG (was Re: Ctypes and the stdlib)

2011-08-29 Thread Neal Becker
Then there is gccxml, although I'm not sure how active it is now.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-29 Thread Terry Reedy

On 8/29/2011 9:00 AM, Barry Warsaw wrote:

On Aug 27, 2011, at 07:11 PM, Martin v. Löwis wrote:


A PEP should IMO only cover end-user aspects of the new re module.
Code organization is typically not in the PEP. To give a specific
example: you mentioned that there is (near) code duplication
MRAB's module. As a reviewer, I would discuss whether this can be
eliminated - but not in the PEP.


+1


I think at this point we need a tracker issue to which can be attached 
such reviews, for safe-keeping, even if most discussion continues here.


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Martin v. Löwis
Am 29.08.2011 11:03, schrieb Dirkjan Ochtman:
> On Sun, Aug 28, 2011 at 21:47, "Martin v. Löwis"  wrote:
>>  result strings. In PEP 393, a buffer must be scanned for the
>>  highest code point, which means that each byte must be inspected
>>  twice (a second time when the copying occurs).
> 
> This may be a silly question: are there things in place to optimize
> this for the case where two strings are combined? E.g. highest
> character in combined string is max(highest character in either of the
> strings).

Unicode_Concat goes like this

maxchar = PyUnicode_MAX_CHAR_VALUE(u);
if (PyUnicode_MAX_CHAR_VALUE(v) > maxchar)
maxchar = PyUnicode_MAX_CHAR_VALUE(v);

/* Concat the two Unicode strings */
w = (PyUnicodeObject *) PyUnicode_New(
PyUnicode_GET_LENGTH(u) +
PyUnicode_GET_LENGTH(v),
maxchar);
if (w == NULL)
goto onError;
PyUnicode_CopyCharacters(w, 0, u, 0, PyUnicode_GET_LENGTH(u));
PyUnicode_CopyCharacters(w, PyUnicode_GET_LENGTH(u), v, 0,
 PyUnicode_GET_LENGTH(v));

> Also, this PEP makes me wonder if there should be a way to distinguish
> between language PEPs and (CPython) implementation PEPs, by adding a
> tag or using the PEP number ranges somehow.

Well, no. This would equally apply to every single patch, and is just
not feasible. Instead, alternative implementations typically target a
CPython version, and then find out what features they need to implement
to claim conformance.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] LZMA compression support in 3.3

2011-08-29 Thread Nadeem Vawda
I've updated the issue  with a patch
containing my work so far - the LZMACompressor and LZMADecompressor classes,
along with some tests. These two classes should provide a fairly complete
interface to liblzma; it will be possible to implement LZMAFile on top of them,
entirely in Python. Note that the C code does no I/O; this will be handled by
LZMAFile.

Please take a look, and let me know what you think.

Cheers,
Nadeem
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP categories (was Re: PEP 393 review)

2011-08-29 Thread Barry Warsaw
On Aug 29, 2011, at 06:40 PM, Antoine Pitrou wrote:

>I like the 3k numbers myself :))

Me too. :) But I think we've pretty much abandoned that convention for any new
PEPs.  Well, until Guido announces Python 4k. :)

-Barry



signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP categories (was Re: PEP 393 review)

2011-08-29 Thread Barry Warsaw
On Aug 29, 2011, at 06:55 PM, Stefan Behnel wrote:

>These things tend to get somewhat clumsy over time, though. What about a
>stdlib change that only applies to CPython for some reason, e.g. because no
>other implementation currently has that module?  I think it's ok to make a
>coarse-grained distinction by numbers, but there should also be a way to tag
>PEPs textually.

Yeah, the categories would be pretty coarse grained, and their orthogonality
would cause classification problems.  I suppose we could use some kind of
hashtag approach.  OTOH, I'm not entirely sure it's worth it either. ;)

I think we'd need a concrete proposal and someone willing to hack the PEP0
autogen tools.

-Barry

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] SWIG (was Re: Ctypes and the stdlib)

2011-08-29 Thread David Cournapeau
On Mon, Aug 29, 2011 at 7:14 PM, Eli Bendersky  wrote:
> 
>>
>> I've sometimes thought it might be interesting to create a Swig
>> replacement purely in Python.  When I work on the PLY project, this is often
>> what I think about.   In that project, I've actually built a number of the
>> parsing tools that would be useful in creating such a thing.   The only
>> catch is that when I start thinking along these lines, I usually reach a
>> point where I say "nah, I'll just write the whole application in Python."
>>
>> Anyways, this is probably way more than anyone wants to know about Swig.
>> Getting back to the original topic of using it to make standard library
>> modules, I just don't know.   I think you probably could have some success
>> with an automatic code generator of some kind.  I'm just not sure it should
>> take the Swig approach of parsing C++ headers.  I think you could do better.
>>
>
> Dave,
>
> Having written a full C99 parser (http://code.google.com/p/pycparser/) based
> on your (excellent) PLY library, my impression is that the problem is with
> the problem, not with the solution. Strange sentence, I know :-) What I mean
> is that parsing C++ (even its headers) is inherently hard, which is why the
> solutions tend to grow so complex. Even with the modest C99, clean and
> simple solutions based on theoretical approaches (like PLY with its
> generated LALR parsers) tend to run into walls [*]. C++ is an order of
> magnitude harder.
>
> If I went to implement something like SWIG today, I would almost surely base
> my implementation on Clang (http://clang.llvm.org/). They have a full C++
> parser (carefully hand-crafted, quite admirably keeping a relatively
> comprehensible code-base for such a task) used in a real compiler front-end,
> and a flexible library structure aimed at creating tools. There are also
> Python bindings that would allow to do most of the interesting
> Python-interface-specific work in Python - parse the C++ headers using
> Clang's existing parser into ASTs - then generate ctypes / extensions from
> that, *in Python*.
>
> The community is also gladly accepting contributions. I've had some fixes
> committed for the Python bindings and the C interfaces that tie them to
> Clang, and got the impression from Clang's core devs that further
> contributions will be most welcome. So whatever is missing from the Python
> bindings can be easily added.

Agreed, I know some people have looked into that direction in the
scientific python community (to generate .pxd for cython). I wrote one
of the hack Stefan refered to (based on ctypeslib using gccxml), and
using clang makes so much more sense.

To go back to the initial issue, using cython to wrap C code makes a
lot of sense. In the scipy community, I believe there is a broad
agreement that most of code which would requires C/C++ should be done
in cython instead (numpy and scipy already do so a bit). I personally
cannot see man situations where writing wrappers in C by hand works
better than cython (especially since cython handles python2/3
automatically for you).

cheers,

David
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread stefan brunthaler
> Perhaps there would be something to say given patches/overviews/specifics.
>
Currently I don't have patches, but for an overview and specifics, I
can provide the following:
* My optimizations basically rely on quickening to incorporate
run-time information.
* I use two separate instruction dispatch routines, and use profiling
to switch from the regular Python 3 dispatch routine to an optimized
one (the implementation is actually vice versa, but that is not
important now)
* The optimized dispatch routine has a changed instruction format
(word-sized instead of bytecodes) that allows for regular instruction
decoding (without the HAS_ARG-check) and inlinind of some objects in
the instruction format on 64bit architectures.
* I use inline-caching based on quickening (passes almost all
regression tests [302 out of 307]), eliminate reference count
operations using quickening (passes but has a memory leak), promote
frequently accessed local variables to their dedicated instructions
(passes), and cache LOAD_GLOBAL/LOAD_NAME objects in the instruction
encoding when possible (I am working on this right now.)

The changes I made can be summarized as:
* I changed some header files to accommodate additional information
(Python.h, ceval.h, code.h, frameobject.h, opcode.h, tupleobject.h)
* I changed mostly abstract.c to incorporate runtime-type feedback.
* All other changes target mostly ceval.c and all supplementary code
is in a sub-directory named "opt" and all generated files in a
sub-directory within that ("opt/gen").
* I have a code generator in place that takes care of generating all
the functions; it uses the Mako template system for creating C code
and does not necessarily need to be shipped with the interpreter
(though one can play around and experiment with it.)

So, all in all, the changes are not that big to the actual
implementation, and most of the code is generated (using sloccount,
opt has 1990 lines of C, and opt/gen has 8649 lines of C).

That's a quick summary, if there are any further or more in-depth
questions, let me know.

best,
--stefan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 6721 "Locks in python standard library should be sanitized on fork"

2011-08-29 Thread Nir Aides
On Mon, Aug 29, 2011 at 8:16 PM, Antoine Pitrou  wrote:
>
> On Mon, 29 Aug 2011 13:03:53 -0400 Jesse Noller  wrote:
> >
> > Yes; but spawning and forking are both slow to begin with - it's
> > documented (I hope heavily enough) that you should spawn
> > multiprocessing children early, and keep them around instead of
> > constantly creating/destroying them.
>
> I think fork() is quite fast on modern systems (e.g. Linux). exec() is
> certainly slow, though.

On my system, the time it takes worker code to start is:

40 usec with thread.start_new_thread
240 usec with threading.Thread().start
450 usec with os.fork
1 ms with multiprocessing.Process.start
25 ms with subprocess.Popen to start a trivial script.

so os.fork has similar latency to threading.Thread().start, while
spawning is 100 times slower.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Benjamin Peterson
2011/8/29 stefan brunthaler :
> So, the two big issues aside, is there any interest in incorporating
> these optimizations in Python 3?

Perhaps there would be something to say given patches/overviews/specifics.


-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 6721 "Locks in python standard library should be sanitized on fork"

2011-08-29 Thread Jesse Noller
On Mon, Aug 29, 2011 at 1:22 PM, Antoine Pitrou  wrote:
> Le lundi 29 août 2011 à 13:23 -0400, Jesse Noller a écrit :
>>
>> Yes, it is annoying; but again - this makes it more consistent with
>> the windows implementation. I'd rather that restriction than the
>> "sanitization" of the ability to use threading and multiprocessing
>> alongside one another.
>
> That sanitization is generally useful, though. For example if you want
> to use any I/O after a fork().

Oh! I don't disagree; I'm just against the removal of the ability to
mix multiprocessing and threads; which it does internally and others
do in every day code.

The "proposed" removal of that functionality - using the two together
- would leave users in the dust, and not needed if we patch
http://bugs.python.org/issue8713 - which at it's core is just an
addition flag. We could document the risk(s) of using the fork()
mechanism which has to remain the default for some time.

The point is, is that the solution to http://bugs.python.org/issue6721
should not be intertwined or cause a severe change in the
multiprocessing module (e.g. "rewriting from scratch"), etc. I'm not
arguing that both bugs should not be fixed.

jesse
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread stefan brunthaler
Hi,

pretty much a year ago I wrote about the optimizations I did for my
PhD thesis that target the Python 3 series interpreters. While I got
some replies, the discussion never really picked up and no final
explicit conclusion was reached. AFAICT, because of the following two
factors, my optimizations were not that interesting for inclusion with
the distribution at that time:
a) Unladden Swallow was targeting Python 3, too.
b) My prototype did not pass the regression tests.

As of November 2010 (IIRC), Google is not supporting work on US
anymore, and the project is stalled. (If I am wrong and there is still
activity and any plans with the corresponding PEP, please let me
know.) Which is why I recently spent some time fixing issues so that I
can run the regression tests. There is still some work to be done, but
by and large it should be possible to complete all regression tests in
reasonable time (with the actual infrastructure in place, enabling
optimizations later on is not a problem at all, too.)

So, the two big issues aside, is there any interest in incorporating
these optimizations in Python 3?

Have a nice day,
--stefan

PS: It probably is unusual, but in a part of my home page I have
created a link to indicate interest (makes both counting and voting
easier, http://www.ics.uci.edu/~sbruntha/) There were also links
indicating interest in funding the work; I have disabled these, so as
not to upset anybody or make the impression of begging for money...
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 6721 "Locks in python standard library should be sanitized on fork"

2011-08-29 Thread Antoine Pitrou
Le lundi 29 août 2011 à 13:23 -0400, Jesse Noller a écrit :
> 
> Yes, it is annoying; but again - this makes it more consistent with
> the windows implementation. I'd rather that restriction than the
> "sanitization" of the ability to use threading and multiprocessing
> alongside one another.

That sanitization is generally useful, though. For example if you want
to use any I/O after a fork().

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 6721 "Locks in python standard library should be sanitized on fork"

2011-08-29 Thread Jesse Noller
On Mon, Aug 29, 2011 at 1:16 PM, Antoine Pitrou  wrote:
> On Mon, 29 Aug 2011 13:03:53 -0400
> Jesse Noller  wrote:
>> 2011/8/29 Charles-François Natali :
>> >> +3 (agreed to Jesse, Antoine and Ask here).
>> >>  The http://bugs.python.org/issue8713 described "non-fork" implementation
>> >> that always uses subprocesses rather than plain forked processes is the
>> >> right way forward for multiprocessing.
>> >
>> > I see two drawbacks:
>> > - it will be slower, since the interpreter startup time is
>> > non-negligible (well, normally you shouldn't spawn a new process for
>> > every item, but it should be noted)
>>
>> Yes; but spawning and forking are both slow to begin with - it's
>> documented (I hope heavily enough) that you should spawn
>> multiprocessing children early, and keep them around instead of
>> constantly creating/destroying them.
>
> I think fork() is quite fast on modern systems (e.g. Linux). exec() is
> certainly slow, though.
>
> The third drawback is that you are limited to picklable objects when
> specifying the arguments for your child process. This can be annoying
> if, for example, you wanted to pass an OS resource.
>
> Regards
>
> Antoine.

Yes, it is annoying; but again - this makes it more consistent with
the windows implementation. I'd rather that restriction than the
"sanitization" of the ability to use threading and multiprocessing
alongside one another.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 6721 "Locks in python standard library should be sanitized on fork"

2011-08-29 Thread Antoine Pitrou
On Mon, 29 Aug 2011 13:03:53 -0400
Jesse Noller  wrote:
> 2011/8/29 Charles-François Natali :
> >> +3 (agreed to Jesse, Antoine and Ask here).
> >>  The http://bugs.python.org/issue8713 described "non-fork" implementation
> >> that always uses subprocesses rather than plain forked processes is the
> >> right way forward for multiprocessing.
> >
> > I see two drawbacks:
> > - it will be slower, since the interpreter startup time is
> > non-negligible (well, normally you shouldn't spawn a new process for
> > every item, but it should be noted)
> 
> Yes; but spawning and forking are both slow to begin with - it's
> documented (I hope heavily enough) that you should spawn
> multiprocessing children early, and keep them around instead of
> constantly creating/destroying them.

I think fork() is quite fast on modern systems (e.g. Linux). exec() is
certainly slow, though.

The third drawback is that you are limited to picklable objects when
specifying the arguments for your child process. This can be annoying
if, for example, you wanted to pass an OS resource.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] SWIG (was Re: Ctypes and the stdlib)

2011-08-29 Thread Eli Bendersky


> I've sometimes thought it might be interesting to create a Swig replacement
> purely in Python.  When I work on the PLY project, this is often what I
> think about.   In that project, I've actually built a number of the parsing
> tools that would be useful in creating such a thing.   The only catch is
> that when I start thinking along these lines, I usually reach a point where
> I say "nah, I'll just write the whole application in Python."
>
> Anyways, this is probably way more than anyone wants to know about Swig.
> Getting back to the original topic of using it to make standard library
> modules, I just don't know.   I think you probably could have some success
> with an automatic code generator of some kind.  I'm just not sure it should
> take the Swig approach of parsing C++ headers.  I think you could do better.
>
>
Dave,

Having written a full C99 parser (http://code.google.com/p/pycparser/) based
on your (excellent) PLY library, my impression is that the problem is with
the problem, not with the solution. Strange sentence, I know :-) What I mean
is that parsing C++ (even its headers) is inherently hard, which is why the
solutions tend to grow so complex. Even with the modest C99, clean and
simple solutions based on theoretical approaches (like PLY with its
generated LALR parsers) tend to run into walls [*]. C++ is an order of
magnitude harder.

If I went to implement something like SWIG today, I would almost surely base
my implementation on Clang (http://clang.llvm.org/). They have a full C++
parser (carefully hand-crafted, quite admirably keeping a relatively
comprehensible code-base for such a task) used in a real compiler front-end,
and a flexible library structure aimed at creating tools. There are also
Python bindings that would allow to do most of the interesting
Python-interface-specific work in Python - parse the C++ headers using
Clang's existing parser into ASTs - then generate ctypes / extensions from
that, *in Python*.

The community is also gladly accepting contributions. I've had some fixes
committed for the Python bindings and the C interfaces that tie them to
Clang, and got the impression from Clang's core devs that further
contributions will be most welcome. So whatever is missing from the Python
bindings can be easily added.

Eli

[*]
http://eli.thegreenplace.net/2011/05/02/the-context-sensitivity-of-c%E2%80%99s-grammar-revisited/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 6721 "Locks in python standard library should be sanitized on fork"

2011-08-29 Thread Jesse Noller
2011/8/29 Charles-François Natali :
>> +3 (agreed to Jesse, Antoine and Ask here).
>>  The http://bugs.python.org/issue8713 described "non-fork" implementation
>> that always uses subprocesses rather than plain forked processes is the
>> right way forward for multiprocessing.
>
> I see two drawbacks:
> - it will be slower, since the interpreter startup time is
> non-negligible (well, normally you shouldn't spawn a new process for
> every item, but it should be noted)

Yes; but spawning and forking are both slow to begin with - it's
documented (I hope heavily enough) that you should spawn
multiprocessing children early, and keep them around instead of
constantly creating/destroying them.

> - it'll consume more memory, since we lose the COW advantage (even
> though it's already limited by the fact that even treating a variable
> read-only can trigger an incref, as was noted in a previous thread)
>
> cf

Yes, it would consume slightly more memory; but the benefits - making
it consistent across *all* platforms with the *same* restrictions gets
us closer to the principle of least surprise.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP categories (was Re: PEP 393 review)

2011-08-29 Thread Stefan Behnel

Barry Warsaw, 29.08.2011 18:24:

On Aug 29, 2011, at 11:03 AM, Dirkjan Ochtman wrote:


Also, this PEP makes me wonder if there should be a way to distinguish
between language PEPs and (CPython) implementation PEPs, by adding a
tag or using the PEP number ranges somehow.


I've thought about this, and about a similar split between language changes
and stdlib changes (i.e. new modules such as regex).  Probably the best thing
to do would be to allocate some 1000's to the different categories, like we
did for the 3xxx Python 3k PEPS (now largely moot though).


These things tend to get somewhat clumsy over time, though. What about a 
stdlib change that only applies to CPython for some reason, e.g. because no 
other implementation currently has that module?


I think it's ok to make a coarse-grained distinction by numbers, but there 
should also be a way to tag PEPs textually.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP categories (was Re: PEP 393 review)

2011-08-29 Thread Antoine Pitrou
On Mon, 29 Aug 2011 18:38:23 +0200
Dirkjan Ochtman  wrote:
> On Mon, Aug 29, 2011 at 18:24, Barry Warsaw  wrote:
> >>Also, this PEP makes me wonder if there should be a way to distinguish
> >>between language PEPs and (CPython) implementation PEPs, by adding a
> >>tag or using the PEP number ranges somehow.
> >
> > I've thought about this, and about a similar split between language changes
> > and stdlib changes (i.e. new modules such as regex).  Probably the best 
> > thing
> > to do would be to allocate some 1000's to the different categories, like we
> > did for the 3xxx Python 3k PEPS (now largely moot though).
> 
> Allocating 1000's seems sensible enough to me.
> 
> And yes, the division between recents 3x and non-3x PEPs seems quite 
> arbitrary.

I like the 3k numbers myself :))



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP categories (was Re: PEP 393 review)

2011-08-29 Thread Dirkjan Ochtman
On Mon, Aug 29, 2011 at 18:24, Barry Warsaw  wrote:
>>Also, this PEP makes me wonder if there should be a way to distinguish
>>between language PEPs and (CPython) implementation PEPs, by adding a
>>tag or using the PEP number ranges somehow.
>
> I've thought about this, and about a similar split between language changes
> and stdlib changes (i.e. new modules such as regex).  Probably the best thing
> to do would be to allocate some 1000's to the different categories, like we
> did for the 3xxx Python 3k PEPS (now largely moot though).

Allocating 1000's seems sensible enough to me.

And yes, the division between recents 3x and non-3x PEPs seems quite arbitrary.

Cheers,

Dirkjan

P.S. Perhaps the index could list accepted and open PEPs before meta
and informational? And maybe reverse the order under some headings,
for example in the finished category...
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 6721 "Locks in python standard library should be sanitized on fork"

2011-08-29 Thread Charles-François Natali
> +3 (agreed to Jesse, Antoine and Ask here).
>  The http://bugs.python.org/issue8713 described "non-fork" implementation
> that always uses subprocesses rather than plain forked processes is the
> right way forward for multiprocessing.

I see two drawbacks:
- it will be slower, since the interpreter startup time is
non-negligible (well, normally you shouldn't spawn a new process for
every item, but it should be noted)
- it'll consume more memory, since we lose the COW advantage (even
though it's already limited by the fact that even treating a variable
read-only can trigger an incref, as was noted in a previous thread)

cf
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP categories (was Re: PEP 393 review)

2011-08-29 Thread Barry Warsaw
On Aug 29, 2011, at 11:03 AM, Dirkjan Ochtman wrote:

>Also, this PEP makes me wonder if there should be a way to distinguish
>between language PEPs and (CPython) implementation PEPs, by adding a
>tag or using the PEP number ranges somehow.

I've thought about this, and about a similar split between language changes
and stdlib changes (i.e. new modules such as regex).  Probably the best thing
to do would be to allocate some 1000's to the different categories, like we
did for the 3xxx Python 3k PEPS (now largely moot though).

-Barry

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Cython, ctypes and the stdlib

2011-08-29 Thread Stefan Behnel

Hi,

I agree that this is getting off-topic for this list. I'm answering here in 
a certain detail to lighten things up a bit regarding thin and thick 
wrappers, but please move further usage related questions to the 
cython-users mailing list.


Paul Moore, 29.08.2011 12:37:

On 29 August 2011 10:39, Stefan Behnel wrote:

In the CPython backend, the header files are normally #included by the
generated C code, so they are used at C compilation time.

Cython has its own view on the header files in separate declaration files
(.pxd). Basically looks like this:

# file "mymath.pxd"
cdef extern from "aheader.h":
double PI
double E
double abs(double x)

These declaration files usually only contain the parts of a header file that
are used in the user code, either manually copied over or extracted by
scripts (that's what I was referring to in my reply to Terry). The complete
'real' content of the header file is then used by the C compiler at C
compilation time.

The user code employs a "cimport" statement to import the declarations at
Cython compilation time, e.g.

# file "mymodule.pyx"
cimport mymath
print mymath.PI + mymath.E

would result in C code that #includes "aheader.h", adds the C constants "PI"
and "E", converts the result to a Python float object and prints it out
using the normal CPython machinery.


One thing that would make it easier for me to understand the role of
Cython in this context would be to see a simple example of the type of
"thin wrapper" we're talking about here. The above code is nearly
this, but the pyx file executes "real code".


Yes, that's the idea. If all you want is an exact, thin wrapper, you are 
better off with SWIG (well, assuming that performance is not important for 
you - Cython is a *lot* faster). But if you use it, or any other plain glue 
code generator, chances are that you will quickly learn that you do not 
actually want a thin wrapper. Instead, you want something that makes the 
external library easily and efficiently usable from Python code. Which 
means that the wrapper will be thin in some places and thick in others, 
sometimes very thick in selected places, and usually growing thicker over time.


You can do this by using a glue code generator and writing the rest in a 
Python wrapper on top of the thin glue code. It's just that Cython makes 
such a wrapper much more efficient (for CPython), be it in terms of CPU 
performance (fast Python interaction, overhead-free C interaction, native C 
data type support, various Python code optimisations), or in terms of 
parallelisation support (explicit GIL-free threading and OpenMP), or just 
general programmer efficiency, e.g. regarding automatic data conversion or 
ease and safety of manual C memory management.




For example, how do I simply expose pi and abs from math.h? Based on
the above, I tried a pyx file containing just the code

 cdef extern from "math.h":
 double pi
 double abs(double x)

but the resulting module exported no symbols.


Recent Cython versions have support for directly exporting C values (e.g. 
enum values) at the Python module level. However, the normal way is to 
explicitly implement the module API as you guessed, i.e.


cimport mydecls   # assuming there is a mydecls.pxd

PI = mydecls.PI
def abs(x):
return mydecls.abs(x)

Looks simple, right? Nothing interesting here, until you start putting 
actual code into it, as in this (totally contrived and untested, but much 
more correct) example:


from libc cimport math

cdef extern from *:
# these are defined by the always included Python.h:
long LONG_MAX, LONG_MIN

def abs(x):
if isinstance(x, float):# -> C double
return math.fabs(x)
elif isinstance(x, int):# -> may or may not be a C integer
if LONG_MIN <= x <= LONG_MAX:
return  math.labs(x)
else:
# either within "long long" or raise OverflowError
return  math.llabs(x)
else:
# assume it can at least coerce to a C long,
# or raise ValueError or OverflowError or whatever
return  math.labs(x)

BTW, there is some simple templating/generics-like type merging support 
upcoming in a GSoC to simplify this kind of type specific code.




This is probably a bit off-topic, but it seems to me that whenever
Cython comes up in these discussions, the implications of
Cython-as-an-implementation-of-python obscure the idea of simply using
Cython as a means of writing thin library wrappers.


Cython is not a glue code generator, it's a full-fledged programming 
language. It's Python, with additional support for C data types. That makes 
it great for writing non-trivial wrappers between Python and C. It's not so 
great for the trivial cases, but luckily, those are rare. ;)




I've kept python-dev in this response, on the assumption that others
on the list 

Re: [Python-Dev] issue 6721 "Locks in python standard library should be sanitized on fork"

2011-08-29 Thread Gregory P. Smith
On Sat, Aug 27, 2011 at 2:59 AM, Ask Solem  wrote:

>
> On 26 Aug 2011, at 16:53, Antoine Pitrou wrote:
>
> >
> > Hi,
> >
> >> I think that "deprecating" the use of threads w/ multiprocessing - or
> >> at least crippling it is the wrong answer. Multiprocessing needs the
> >> helper threads it uses internally to manage queues, etc. Removing that
> >> ability would require a near-total rewrite, which is just a
> >> non-starter.
> >
> > I agree that this wouldn't actually benefit anyone.
> > Besides, I don't think it's even possible to avoid threads in
> > multiprocessing, given the various constraints. We would have to force
> > the user to run their main thread in an event loop, and that would be
> > twisted (tm).
> >
> >> I would focus on the atfork() patch more directly, ignoring
> >> multiprocessing in the discussion, and focusing on the merits of gps'
> >> initial proposal and patch.
> >
> > I think this could also be combined with Charles-François' patch.
> >
> > Regards
>
>
>
> Have to agree with Jesse and Antoine here.
>
> Celery (celeryproject.org) uses multiprocessing, is wildly used in
> production,
> and is regarded as stable software that have been known to run for months
> at a time
> only to be restarted for software upgrades.
>
> I have been investigating an issue for some time, that I'm pretty sure is
> caused
> by this.  It occurs only rarely, so rarely I have not had any actual bug
> reports
> about it, it's just something I have experienced during extensive testing.
> The tone of the discussion on the bug tracker makes me think that I have
> been very lucky :-)
>
> Using the fork+exec approach seems like a much more realistic solution
> than rewriting multiprocessing.Pool and Manager to not use threads. In fact
> this is something I have been considering as a fix for the suspected
> issue for for some time.
> It does have implications that are annoying for sure, but we are already
> used to this on the Windows platform (it could help portability even).
>

+3 (agreed to Jesse, Antoine and Ask here).  The
http://bugs.python.org/issue8713 described "non-fork" implementation that
always uses subprocesses rather than plain forked processes is the right way
forward for multiprocessing.

-gps
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-29 Thread Barry Warsaw
On Aug 27, 2011, at 01:15 PM, Ben Finney wrote:

>My question is directed more to M-A Lemburg's passage above, and its
>implicit assumption that the user understand the changes between
>“Unicode 2.0/3.0 semantics” and “Unicode 6 semantics”, and how their own
>needs relate to those semantics.

More likely, it'll be a choice between wanting Unicode 6 semantics, and "don't
care".  So the PEP could include some clues as to why you'd care to use regex
instead of re.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-29 Thread Barry Warsaw
On Aug 26, 2011, at 05:25 PM, Dan Stromberg wrote:

>from __future__ import is an established way of trying something for a while
>to see if it's going to work.

Actually, no.

The documentation says:

-snip snip-
__future__ is a real module, and serves three purposes:

* To avoid confusing existing tools that analyze import statements and expect
  to find the modules they’re importing.
* To ensure that future statements run under releases prior to 2.1 at least
  yield runtime exceptions (the import of __future__ will fail, because there
  was no module of that name prior to 2.1).
* To document when incompatible changes were introduced, and when they will be
  — or were — made mandatory. This is a form of executable documentation, and
  can be inspected programmatically via importing __future__ and examining its
  contents.
-snip snip-

So, really the __future__ module is a way to introduce accepted but
incompatible changes in a controlled way, through successive releases.  It's
never been used to introduce experimental features that might be removed if
they don't work out.

Cheers,
-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)

2011-08-29 Thread Benjamin Peterson
2011/8/29 Glyph Lefkowitz :
>
> On Aug 28, 2011, at 7:27 PM, Guido van Rossum wrote:
>
> In general, an existing library cannot be called
> without access to its .h files -- there are probably struct and
> constant definitions, platform-specific #ifdefs and #defines, and
> other things in there that affect the linker-level calling conventions
> for the functions in the library.
>
> Unfortunately I don't know a lot about this, but I keep hearing about
> something called "rffi" that PyPy uses to call C from RPython:
> .  This has some
> shortcomings currently, most notably the fact that it needs those .h files
> (and therefore a C compiler) at runtime

This is incorrect. rffi is actually quite like ctypes. The part you
are referring to is probably rffi_platform [1], which invokes the
compiler to determine constant values and struct offsets, or
ctypes_configure, which does need runtime headers [2].

[1] 
https://bitbucket.org/pypy/pypy/src/92e36ab4eb5e/pypy/rpython/tool/rffi_platform.py

[2] https://bitbucket.org/pypy/pypy/src/92e36ab4eb5e/ctypes_configure/

-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-29 Thread Barry Warsaw
On Aug 27, 2011, at 07:11 PM, Martin v. Löwis wrote:

>A PEP should IMO only cover end-user aspects of the new re module.
>Code organization is typically not in the PEP. To give a specific
>example: you mentioned that there is (near) code duplication
>MRAB's module. As a reviewer, I would discuss whether this can be
>eliminated - but not in the PEP.

+1

-Barry

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Software Transactional Memory for Python

2011-08-29 Thread Armin Rigo
Hi Charles-François,

2011/8/27 Charles-François Natali :
> The problem is that many locks are actually acquired implicitely.
> For example, `print` to a buffered stream will acquire the fileobject's mutex.

Indeed.  After looking more at the kind of locks used throughout the
stdlib, I notice that in many cases a lock is acquired by code in the
following simple pattern:

Py_BEGIN_ALLOW_THREADS
PyThread_acquire_lock(self->lock, 1);
Py_END_ALLOW_THREADS

If one thread is waiting in the END_ALLOW_THREADS for another one to
release the GIL, but the other one is in a "with atomic" block and
tries to acquire the same lock, deadlock.  But the issue can be
resolved: the first thread in the above example needs to notice that
the other thread is in a "with atomic" block, and "be nice" and
release the lock again.  Then it waits until the "with atomic" block
finishes, and tries again from the start.

We could do this by putting the above pattern it own function (which
makes some sense anyway, because the pattern is repeated left and
right, and is often complicated by an additional "if
(!PyThread_acquire_lock(self->lock, 0))" before); and then allowing
that function to be overridden by the external 'stm' module.

I suspect that I need to do a more thorough review of the stdlib to
make sure (at least more than now) that all potential deadlocking
places can be avoided with a similar refactoring.  All in all, it
seems that the patch to CPython itself will need to be more than just
the few lines in ceval.c --- but still very reasonable both in size and
in content.


A bientôt,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Software Transactional Memory for Python

2011-08-29 Thread Gregory P. Smith
On Mon, Aug 29, 2011 at 5:20 AM, Antoine Pitrou  wrote:

> On Sun, 28 Aug 2011 09:43:33 -0700
> Guido van Rossum  wrote:
> >
> > This sounds like a very interesting idea to pursue, even if it's late,
> > and even if it's experimental, and even if it's possible to cause
> > deadlocks (no news there). I propose that we offer a C API in Python
> > 3.3 as well as an extension module that offers the proposed decorator.
> > The C API could then be used to implement alternative APIs purely as
> > extension modules (e.g. would a deadlock-detecting API be possible?).
>
> We could offer the C API without shipping an extension module ourselves.
> I don't think we should provide (and maintain!) a Python API that helps
> users put themselves in all kind of nasty situations. There is enough
> misunderstanding around the GIL and multithreading already.
>

+1
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] SWIG (was Re: Ctypes and the stdlib)

2011-08-29 Thread David Beazley
On Mon, Aug 29, 2011 at 12:27 PM, Guido van Rossum  wrote:

> I wonder if for
> this particular purpose SWIG isn't the better match. (If SWIG weren't
> universally hated, even by its original author. :-)

Hate is probably a strong word, but as the author of Swig, let me chime in here 
;-).   I think there are probably some lessons to be learned from Swig.

As Nick noted, Swig is best suited when you have control over both sides (C/C++ 
and Python) of whatever code you're working with.  In fact, the original 
motivation for  Swig was to give application programmers (scientists in my 
case), a means for automatically generating the Python bindings to their code.  
However, there was one other important assumption--and that was the fact that 
all of your "real code" was going to be written in C/C++ and that the Python 
scripting interface was just an optional add-on (perhaps even just a throw-away 
thing).  Keep in mind, Swig was first created in 1995 and at that time, the use 
of Python (or any similar language) was a pretty radical idea in the sciences.  
Moreover, there was a lot of legacy code that people just weren't going to 
abandon.  Thus, I always viewed Swig as a kind of transitional vehicle for 
getting people to use Python who might otherwise not even consider it.   
Getting back to Nick's point though, to really use Swig effectiv
 ely, it was always known that you might have to reorganize or refactor your 
C/C++ code to make it more Python friendly.  However, due to the automatic 
wrapper generation, you didn't have to do it all at once.  Basically your code 
could organically evolve and Swig would just keep up with whatever you were 
doing.  In my projects, we'd usually just tuck Swig away in some Makefile 
somewhere and forget about it.

One of the major complexities of Swig is the fact that it attempts to parse 
C/C++ header files.   This very notion is actually a dangerous trap waiting for 
anyone who wants to wander into it.  You might look at a header file and say, 
well how hard could it be to just grab a few definitions out of there?   I'll 
just write a few regexs or come up with some simple hack for recognizing 
function definitions or something.   Yes, you can do that, but you're 
immediately going to find that whatever approach you take starts to break down 
into horrible corner cases.   Swig started out like this and quickly turned 
into a quagmire of esoteric bug reports.  All sorts of problems with 
preprocessor macros, typedefs, missing headers, and other things.  For awhile, 
I would get these bug reports that would go something like "I had this C++ 
class inside a namespace with an abstract method taking a typedef'd const 
reference to this smart pointer . and Swig broke."   Hell, I can't even 
underst
 and the bug report let alone know how to fix it.  Almost all of these bugs 
were due to the fact that Swig started out as a hack and didn't really have any 
kind of solid conceptual foundation for how it should be put together.

If you flash forward a bit, from about 2001-2004 there was a very serious push 
to fix these kinds of issues.  Although it was not a complete rewrite of Swig, 
there were a huge number of changes to how it worked during this time.  Swig 
grew a fully compatible C++ preprocessor that fully supported macros   A 
complete C++ type system was implemented including support for namespaces, 
templates, and even such things as template partial specialization.  Swig 
evolved into a multi-pass compiler that was doing all sorts of global analysis 
of the interface.   Just to give you an idea, Swig would do things such as 
automatically detect/wrap C++ smart pointers.  It could wrap overloaded C++ 
methods/function.  Also, if you had a C++ class with virtual methods, it would 
only make one Python wrapper function and then reuse across all wrapped 
subclasses. 

Under the covers of all of this, the implementation basically evolved into a 
sophisticated macro preprocessor coupled with a pattern matching engine built 
on top of the C++ type system.   For example, you could write patterns that 
matched specific C++ types (the much hated "typemap" feature) and you could 
write patterns that matched entire C++ declarations.  This whole pattern 
matching approach had a huge power if you knew what you were doing.  For 
example, I had a graduate student working on adding "contracts" to 
Swig--something that was being funded by a NSF grant.   It was cool and mind 
boggling all at once.

In hindsight however, I think the complexity of Swig has exceeded anyone's 
ability to fully understand it (including my own).  For example, to even make 
sense of what's happening, you have to have a pretty solid grasp of the C/C++ 
type system (easier said than done).   Couple that with all sorts of crazy 
pattern matching, low-level code fragments, and a ton of macro definitions, 
your head will literally explode if you try to figure out what's happening.   
So far as I know, recent ver

Re: [Python-Dev] LZMA compression support in 3.3

2011-08-29 Thread Barry Warsaw
On Aug 27, 2011, at 10:36 PM, Nadeem Vawda wrote:

>I talked to Antoine about this on IRC; he didn't seem to think a PEP would be
>necessary. But a summary of the discussion on the tracker issue might still
>be a useful thing to have, given how long it's gotten.

I agree with Antoine - no PEP should be necessary.  A well reviewed and tested
module should do it.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Software Transactional Memory for Python

2011-08-29 Thread Antoine Pitrou
On Sun, 28 Aug 2011 09:43:33 -0700
Guido van Rossum  wrote:
> 
> This sounds like a very interesting idea to pursue, even if it's late,
> and even if it's experimental, and even if it's possible to cause
> deadlocks (no news there). I propose that we offer a C API in Python
> 3.3 as well as an extension module that offers the proposed decorator.
> The C API could then be used to implement alternative APIs purely as
> extension modules (e.g. would a deadlock-detecting API be possible?).

We could offer the C API without shipping an extension module ourselves.
I don't think we should provide (and maintain!) a Python API that helps
users put themselves in all kind of nasty situations. There is enough
misunderstanding around the GIL and multithreading already.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-29 Thread Antoine Pitrou
On Mon, 29 Aug 2011 12:43:24 +0900
"Stephen J. Turnbull"  wrote:
> 
> Since when can s[0] represent a code point outside the BMP, for s a
> Unicode string in a narrow build?
> 
> Remember, the UCS-2/narrow vs. UCS-4/wide distinction is *not* about
> what Python supports vs. the outside world.  It's about what the str/
> unicode type is an array of.

Why would that be?

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-29 Thread Ezio Melotti
On Sun, Aug 28, 2011 at 7:28 AM, Guido van Rossum  wrote:

>
> Are you volunteering? (Even if you don't want to be the only
> maintainer, it still sounds like you'd be a good co-maintainer of the
> regex module.)
>

My name is listed in the experts index for 're' [0], and that should make me
already "co-maintainer" for the module.


> [...]
>
> >   4) add documentation for the module and the (public) functions in
> > Doc/library (this should be done anyway).
>
> Does regex have a significany public C interface? (_sre.c doesn't.)
> Does it have a Python-level interface beyond what re.py offers (apart
> from the obvious new flags and new regex syntax/semantics)?
>

I don't think it does.
Explaining the new syntax/semantics is useful for developers (e.g.what \p
and \X are supposed to match), but also for users, so it's fine to have this
documented in Doc/library/re.rst (and I don't think it's necessary to
duplicate it in the README/PEP/Wiki).


>
> > This will ensure that the general quality of the code is good, and when
> > someone actually has to work on the code, there's enough documentation to
> > make it possible.
>
> That sounds like a good description of a process that could lead to
> acceptance of regex as a re replacement.
>
>
So if we want to get this done I think we need Matthew for 1) (unless
someone else wants to do it and have him review the result).
If making a diff with the current re is doable and makes sense, we can use
the rietveld instance on the bug tracker to make the review for 2).  The
same could be done with a diff that replaces the whole module though.
3) will follow after 2), and 4) is not difficult and can be done when we
actually replace re (it's probably enough to reorganize a bit and convert to
rst the page on PyPI).

Best Regards,
Ezio Melotti

[0]: http://docs.python.org/devguide/experts.html#stdlib
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ctypes and the stdlib

2011-08-29 Thread Paul Moore
On 29 August 2011 10:39, Stefan Behnel  wrote:
> In the CPython backend, the header files are normally #included by the
> generated C code, so they are used at C compilation time.
>
> Cython has its own view on the header files in separate declaration files
> (.pxd). Basically looks like this:
>
>    # file "mymath.pxd"
>    cdef extern from "aheader.h":
>        double PI
>        double E
>        double abs(double x)
>
> These declaration files usually only contain the parts of a header file that
> are used in the user code, either manually copied over or extracted by
> scripts (that's what I was referring to in my reply to Terry). The complete
> 'real' content of the header file is then used by the C compiler at C
> compilation time.
>
> The user code employs a "cimport" statement to import the declarations at
> Cython compilation time, e.g.
>
>    # file "mymodule.pyx"
>    cimport mymath
>    print mymath.PI + mymath.E
>
> would result in C code that #includes "aheader.h", adds the C constants "PI"
> and "E", converts the result to a Python float object and prints it out
> using the normal CPython machinery.

One thing that would make it easier for me to understand the role of
Cython in this context would be to see a simple example of the type of
"thin wrapper" we're talking about here. The above code is nearly
this, but the pyx file executes "real code".

For example, how do I simply expose pi and abs from math.h? Based on
the above, I tried a pyx file containing just the code

cdef extern from "math.h":
double pi
double abs(double x)

but the resulting module exported no symbols. What am I doing wrong?
Could you show a working example of writing such a wrapper?

This is probably a bit off-topic, but it seems to me that whenever
Cython comes up in these discussions, the implications of
Cython-as-an-implementation-of-python obscure the idea of simply using
Cython as a means of writing thin library wrappers.

Just to clarify - the above code (if it works) seems to me like a nice
simple means of writing wrappers. Something involving this in a pxd
file, plus a pyx file with a whole load of dummy

def abs(x):
return cimported_module.abs(x)

definitions, seems ok, but annoyingly clumsy. (Particularly for big APIs).

I've kept python-dev in this response, on the assumption that others
on the list might be glad of seeing a concrete example of using Cython
to build wrapper code. But anything further should probably be taken
off-list...

Thanks,
Paul.

PS This would also probably be a useful addition to the Cython wiki
and/or the manual. I searched both and found very little other than a
page on wrapping C++ classes (which is not very helpful for simple C
global functions and constants).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ctypes and the stdlib

2011-08-29 Thread Stefan Behnel

Guido van Rossum, 29.08.2011 04:27:

On Sun, Aug 28, 2011 at 11:23 AM, Stefan Behnel wrote:

Terry Reedy, 28.08.2011 06:58:

I think it needs a SWIG-like
companion script that can write at least first-pass ctypes code from the .h
header files. Or maybe it could/should use header info at runtime (with the
.h bundled with a module).


 From my experience, this is a "nice to have" more than a requirement. It has
been requested for Cython a couple of times, especially by new users, and
there are a couple of scripts out there that do this to some extent. But the
usual problem is that Cython users (and, similarly, ctypes users) do not
want a 1:1 mapping of a library API to a Python API (there's SWIG for that),
and you can't easily get more than a trivial mapping out of a script. But,
yes, a one-shot generator for the necessary declarations would at least help
in cases where the API to be wrapped is somewhat large.


Hm, the main use that was proposed here for ctypes is to wrap existing
libraries (not to create nicer APIs, that can be done in pure Python
on top of this).


The same applies to Cython, obviously. The main advantage of Cython over 
ctypes for this is that the Python-level wrapper code is also compiled into 
C, so whenever the need for a thicker wrapper arises in some part of the 
API, you don't loose any performance in intermediate layers.




In general, an existing library cannot be called
without access to its .h files -- there are probably struct and
constant definitions, platform-specific #ifdefs and #defines, and
other things in there that affect the linker-level calling conventions
for the functions in the library. (Just like Python's own .h files --
e.g. the extensive renaming of the Unicode APIs depending on
narrow/wide build) How does Cython deal with these?


In the CPython backend, the header files are normally #included by the 
generated C code, so they are used at C compilation time.


Cython has its own view on the header files in separate declaration files 
(.pxd). Basically looks like this:


# file "mymath.pxd"
cdef extern from "aheader.h":
double PI
double E
double abs(double x)

These declaration files usually only contain the parts of a header file 
that are used in the user code, either manually copied over or extracted by 
scripts (that's what I was referring to in my reply to Terry). The complete 
'real' content of the header file is then used by the C compiler at C 
compilation time.


The user code employs a "cimport" statement to import the declarations at 
Cython compilation time, e.g.


# file "mymodule.pyx"
cimport mymath
print mymath.PI + mymath.E

would result in C code that #includes "aheader.h", adds the C constants 
"PI" and "E", converts the result to a Python float object and prints it 
out using the normal CPython machinery.


This means that declarations can be reused across modules, just like with 
header files. In fact, Cython actually ships with a couple of common 
declaration files, e.g. for parts of libc, NumPy or CPython's C-API.


I don't know that much about the IronPython backend, but from what I heard, 
it uses basically the same build time mechanisms and generates a thin C++ 
wrapper and a corresponding CLI part as glue layer.


The ctypes backend for PyPy works different in that it generates a Python 
module from the .pxd files that contains the declarations as ctypes code. 
Then, the user code imports that normally at Python runtime. Obviously, 
this means that there are cases where the Cython-level declarations and 
thus the generated ctypes code will not match the ABI for a given target 
platform. So, in the worst case, there is a need to manually adapt the 
ctypes declarations in the Python module that was generated from the .pxd. 
Not worse than the current situation, though, and the rest of the Cython 
wrapper will compile into plain Python code that simply imports the 
declarations from the .pxd modules. But there's certainly room for 
improvements here.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Software Transactional Memory for Python

2011-08-29 Thread Armin Rigo
Hi Guido,

On Sun, Aug 28, 2011 at 6:43 PM, Guido van Rossum  wrote:
> This sounds like a very interesting idea to pursue, even if it's late,
> and even if it's experimental, and even if it's possible to cause
> deadlocks (no news there). I propose that we offer a C API in Python
> 3.3 as well as an extension module that offers the proposed decorator.

Very good idea.  http://bugs.python.org/issue12850

The extension module, called 'stm' for now, is designed as an
independent 3rd-party extension module.  It should at this point not
be included in the stdlib; for one thing, it needs some more testing
than my quick one-page hacks, and we need to seriously look at the
deadlock issues mentioned here.  But the patch to ceval.c above looks
rather straightforward to me and could, if no subtle issue is found,
be included in the standard CPython.


A bientôt,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Victor Stinner

Le 28/08/2011 23:06, "Martin v. Löwis" a écrit :

Am 28.08.2011 22:01, schrieb Antoine Pitrou:



- the iobench results are between 2% acceleration (seek operations),
   16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and
   37% for large sized reads (154 MB/s vs. 235 MB/s). The speed
   difference is probably in the UTF-8 decoder; I have already
   restored the "runs of ASCII" optimization and am out of ideas for
   further speedups. Again, having to scan the UTF-8 string twice
   is probably one cause of slowdown.


I don't think it's the UTF-8 decoder because I see an even larger
slowdown with simpler encodings (e.g. "-E latin1" or "-E utf-16le").


Those haven't been ported to the new API, yet. Consider, for example,
d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test;
with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this
is a 25% speedup for PEP 393.


If I understand correctly, the performance now highly depend on the used 
characters? A pure ASCII string is faster than a string with characters 
in the ISO-8859-1 charset? Is it also true for BMP characters vs non-BMP 
characters?


Do these benchmark tools use only ASCII characters, or also some 
ISO-8859-1 characters? Or, better, different Unicode ranges in different 
tests?


Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Victor Stinner

Le 29/08/2011 11:03, Dirkjan Ochtman a écrit :

On Sun, Aug 28, 2011 at 21:47, "Martin v. Löwis"  wrote:

  result strings. In PEP 393, a buffer must be scanned for the
  highest code point, which means that each byte must be inspected
  twice (a second time when the copying occurs).


This may be a silly question: are there things in place to optimize
this for the case where two strings are combined? E.g. highest
character in combined string is max(highest character in either of the
strings).


The "double-scan" issue is only for codec decoders.

If you combine two Unicode objects (a+b), you already know the highest 
code point and the kind of each string.


Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Dirkjan Ochtman
On Sun, Aug 28, 2011 at 21:47, "Martin v. Löwis"  wrote:
>  result strings. In PEP 393, a buffer must be scanned for the
>  highest code point, which means that each byte must be inspected
>  twice (a second time when the copying occurs).

This may be a silly question: are there things in place to optimize
this for the case where two strings are combined? E.g. highest
character in combined string is max(highest character in either of the
strings).

Also, this PEP makes me wonder if there should be a way to distinguish
between language PEPs and (CPython) implementation PEPs, by adding a
tag or using the PEP number ranges somehow.

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)

2011-08-29 Thread M.-A. Lemburg
Guido van Rossum wrote:
> On Sun, Aug 28, 2011 at 11:23 AM, Stefan Behnel  wrote:
>> Hi,
>>
>> sorry for hooking in here with my usual Cython bias and promotion. When the
>> question comes up what a good FFI for Python should look like, it's an
>> obvious reaction from my part to throw Cython into the game.
>>
>> Terry Reedy, 28.08.2011 06:58:
>>>
>>> Dan, I once had the more or less the same opinion/question as you with
>>> regard to ctypes, but I now see at least 3 problems.
>>>
>>> 1) It seems hard to write it correctly. There are currently 47 open ctypes
>>> issues, with 9 being feature requests, leaving 38 behavior-related issues.
>>> Tom Heller has not been able to work on it since the beginning of 2010 and
>>> has formally withdrawn as maintainer. No one else that I know of has taken
>>> his place.
>>
>> Cython has an active set of developers and a rather large and growing user
>> base.
>>
>> It certainly has lots of open issues in its bug tracker, but most of them
>> are there because we *know* where the development needs to go, not so much
>> because we don't know how to get there. After all, the semantics of Python
>> and C/C++, between which Cython sits, are pretty much established.
>>
>> Cython compiles to C code for CPython, (hopefully soon [1]) to Python+ctypes
>> for PyPy and (mostly [2]) C++/CLI code for IronPython, which boils down to
>> the same build time and runtime kind of dependencies that the supported
>> Python runtimes have anyway. It does not add dependencies on any external
>> libraries by itself, such as the libffi in CPython's ctypes implementation.
>>
>> For the CPython backend, the generated code is very portable and is
>> self-contained when compiled against the CPython runtime (plus, obviously,
>> libraries that the user code explicitly uses). It generates efficient code
>> for all existing CPython versions starting with Python 2.4, with several
>> optimisations also for recent CPython versions (including the upcoming 3.3).
>>
>>
>>> 2) It is not trivial to use it correctly.
>>
>> Cython is basically Python, so Python developers with some C or C++
>> knowledge tend to get along with it quickly.
>>
>> I can't say yet how easy it is (or will be) to write code that is portable
>> across independent Python implementations, but given that that field is
>> still young, there's certainly a lot that can be done to aid this.
> 
> Cythin does sound attractive for cross-Python-implementation use. This
> is exciting.
> 
>>> I think it needs a SWIG-like
>>> companion script that can write at least first-pass ctypes code from the .h
>>> header files. Or maybe it could/should use header info at runtime (with the
>>> .h bundled with a module).
>>
>> From my experience, this is a "nice to have" more than a requirement. It has
>> been requested for Cython a couple of times, especially by new users, and
>> there are a couple of scripts out there that do this to some extent. But the
>> usual problem is that Cython users (and, similarly, ctypes users) do not
>> want a 1:1 mapping of a library API to a Python API (there's SWIG for that),
>> and you can't easily get more than a trivial mapping out of a script. But,
>> yes, a one-shot generator for the necessary declarations would at least help
>> in cases where the API to be wrapped is somewhat large.
> 
> Hm, the main use that was proposed here for ctypes is to wrap existing
> libraries (not to create nicer APIs, that can be done in pure Python
> on top of this). In general, an existing library cannot be called
> without access to its .h files -- there are probably struct and
> constant definitions, platform-specific #ifdefs and #defines, and
> other things in there that affect the linker-level calling conventions
> for the functions in the library. (Just like Python's own .h files --
> e.g. the extensive renaming of the Unicode APIs depending on
> narrow/wide build) How does Cython deal with these? I wonder if for
> this particular purpose SWIG isn't the better match. (If SWIG weren't
> universally hated, even by its original author. :-)

SIP is an alternative to SWIG:

 http://www.riverbankcomputing.com/software/sip/intro
 http://pypi.python.org/pypi/SIP

and there are a few others as well:

 http://wiki.python.org/moin/IntegratingPythonWithOtherLanguages

>>> 3) It seems to be slower than compiled C extension wrappers. That, at
>>> least, was the discovery of someone who re-wrote pygame using ctypes. (The
>>> hope was that using ctypes would aid porting to 3.x, but the time penalty
>>> was apparently too much for time-critical code.)
>>
>> Cython code can be as fast as C code, and in some cases, especially when
>> developer time is limited, even faster than hand written C extensions. It
>> allows for a straight forward optimisation path from regular Python code
>> down to the speed of C, and trivial interaction with C code itself, if the
>> need arises.
>>
>> Stefan
>>
>>
>> [1] The PyPy port of Cython is currently being written a