Re: [Python-Dev] More compact dictionaries with faster iteration

2012-12-09 Thread Christian Heimes
Hello Raymond

Am 10.12.2012 02:44, schrieb Raymond Hettinger:
> Another benefit is that resizing is faster and 
> touches fewer pieces of memory.  Currently, every
> hash/key/value entry is moved or copied during a
> resize.  In the new layout, only the indices are
> updated.  For the most part, the hash/key/value entries
> never move (except for an occasional swap to fill a
> hole left by a deletion).
> 
> With the reduced memory footprint, we can also expect
> better cache utilization.

On the other hand every lookup and collision checks needs an additional
multiplication, addition and pointer dereferencing:

  entry = entries_baseaddr + sizeof(PyDictKeyEntry) * idx

It's hard to predict how the extra CPU instructions are going to affect
the performance on different CPUs and scenarios. My guts tell me that
your proposal is going to perform better on modern CPUs with lots of L1
and L2 cache and in an average case scenario. A worst case scenario with
lots of collisions might be measurable slower for large dicts and small
CPU cache.

But this is pure speculation and my guts could be terrible wrong. :) I'm
+1 to give it a shot.

Good luck!
Christian






___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More compact dictionaries with faster iteration

2012-12-09 Thread Gregory P. Smith
On Sun, Dec 9, 2012 at 5:44 PM, Raymond Hettinger <
raymond.hettin...@gmail.com> wrote:

> The current memory layout for dictionaries is
> unnecessarily inefficient.  It has a sparse table of
> 24-byte entries containing the hash value, key pointer,
> and value pointer.
>
> Instead, the 24-byte entries should be stored in a
> dense table referenced by a sparse table of indices.
>
> For example, the dictionary:
>
> d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}
>
> is currently stored as:
>
> entries = [['--', '--', '--'],
>[-8522787127447073495, 'barry', 'green'],
>['--', '--', '--'],
>['--', '--', '--'],
>['--', '--', '--'],
>[-9092791511155847987, 'timmy', 'red'],
>['--', '--', '--'],
>[-6480567542315338377, 'guido', 'blue']]
>
> Instead, the data should be organized as follows:
>
> indices =  [None, 1, None, None, None, 0, None, 2]
> entries =  [[-9092791511155847987, 'timmy', 'red'],
> [-8522787127447073495, 'barry', 'green'],
> [-6480567542315338377, 'guido', 'blue']]
>
> Only the data layout needs to change.  The hash table
> algorithms would stay the same.  All of the current
> optimizations would be kept, including key-sharing
> dicts and custom lookup functions for string-only
> dicts.  There is no change to the hash functions, the
> table search order, or collision statistics.
>
> The memory savings are significant (from 30% to 95%
> compression depending on the how full the table is).
> Small dicts (size 0, 1, or 2) get the most benefit.
>
> For a sparse table of size t with n entries, the sizes are:
>
> curr_size = 24 * t
> new_size = 24 * n + sizeof(index) * t
>
> In the above timmy/barry/guido example, the current
> size is 192 bytes (eight 24-byte entries) and the new
> size is 80 bytes (three 24-byte entries plus eight
> 1-byte indices).  That gives 58% compression.
>
> Note, the sizeof(index) can be as small as a single
> byte for small dicts, two bytes for bigger dicts and
> up to sizeof(Py_ssize_t) for huge dict.
>
> In addition to space savings, the new memory layout
> makes iteration faster.  Currently, keys(), values, and
> items() loop over the sparse table, skipping-over free
> slots in the hash table.  Now, keys/values/items can
> loop directly over the dense table, using fewer memory
> accesses.
>
> Another benefit is that resizing is faster and
> touches fewer pieces of memory.  Currently, every
> hash/key/value entry is moved or copied during a
> resize.  In the new layout, only the indices are
> updated.  For the most part, the hash/key/value entries
> never move (except for an occasional swap to fill a
> hole left by a deletion).
>
> With the reduced memory footprint, we can also expect
> better cache utilization.
>
> For those wanting to experiment with the design,
> there is a pure Python proof-of-concept here:
>
>http://code.activestate.com/recipes/578375
>
> YMMV: Keep in mind that the above size statics assume a
> build with 64-bit Py_ssize_t and 64-bit pointers.  The
> space savings percentages are a bit different on other
> builds.  Also, note that in many applications, the size
> of the data dominates the size of the container (i.e.
> the weight of a bucket of water is mostly the water,
> not the bucket).
>

+1  I like it.  The plethora of 64-bit NULL values we cart around in our
internals bothers me, this gets rid of some.

The worst case of ~30% less memory consumed for the bucket (ignoring the
water) is still decent.  For a program with a lot of instances of small to
medium sized objects I'd expect a measurable win.

I'm interested in seeing performance and memory footprint numbers from a C
implementation of this.

-gps
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-09 Thread Andrew McNabb
On Fri, Dec 07, 2012 at 05:02:26PM -0500, PJ Eby wrote:
> If the packages have files in conflict, they won't be both installed.
> If they don't have files in conflict, there's nothing important to be
> informed of.  If one is installing pexpect-u, then one does not need
> to discover that it is a successor of pexpect.

In the specific case of pexpect and pexpect-u, the files don't actually
conflict.  The pexpect package includes a "pexpect.py" file, while
pexpect-u includes a "pexpect/" directory.  These conflict, but not in
the easily detectable sense.

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More compact dictionaries with faster iteration

2012-12-09 Thread Raymond Hettinger

On Dec 9, 2012, at 10:03 PM, MRAB  wrote:

> What happens when a key is deleted? Will that leave an empty slot in
> the entry array?

Yes.  See the __delitem__() method in the pure python implemention
at http://code.activestate.com/recipes/578375

> If so, will the array be compacted if the proportion
> of entries grows beyond a certain limit?

Yes.  Compaction happens during resizing() which triggers when
the dict reaches two-thirds full (including dummy entries).
See the __setitem__() method in the pure python implementation.

> Adding a key would involve simply appending to the entry array (filling
> the last empty slot), but if there could be empty slots in other
> places, would it look for the first?

Yes.  See the _next_open_index() helper method in the pure python implemention.

> Actually, as I think about it, you could form a linked list (using the
> 'hash' field) of those empty slots that aren't part of the final
> contiguous block and fill those preferentially.

That's the plan.  See the comment above the keylist.index(UNUSED)
line in the _next_open_index() method in the pure python implementation.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More compact dictionaries with faster iteration

2012-12-09 Thread MRAB

On 2012-12-10 01:44, Raymond Hettinger wrote:

The current memory layout for dictionaries is
unnecessarily inefficient.  It has a sparse table of
24-byte entries containing the hash value, key pointer,
and value pointer.

Instead, the 24-byte entries should be stored in a
dense table referenced by a sparse table of indices.

For example, the dictionary:

 d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as:

 entries = [['--', '--', '--'],
[-8522787127447073495, 'barry', 'green'],
['--', '--', '--'],
['--', '--', '--'],
['--', '--', '--'],
[-9092791511155847987, 'timmy', 'red'],
['--', '--', '--'],
[-6480567542315338377, 'guido', 'blue']]

Instead, the data should be organized as follows:

 indices =  [None, 1, None, None, None, 0, None, 2]
 entries =  [[-9092791511155847987, 'timmy', 'red'],
 [-8522787127447073495, 'barry', 'green'],
 [-6480567542315338377, 'guido', 'blue']]

Only the data layout needs to change.  The hash table
algorithms would stay the same.  All of the current
optimizations would be kept, including key-sharing
dicts and custom lookup functions for string-only
dicts.  There is no change to the hash functions, the
table search order, or collision statistics.

The memory savings are significant (from 30% to 95%
compression depending on the how full the table is).
Small dicts (size 0, 1, or 2) get the most benefit.

For a sparse table of size t with n entries, the sizes are:

 curr_size = 24 * t
 new_size = 24 * n + sizeof(index) * t

In the above timmy/barry/guido example, the current
size is 192 bytes (eight 24-byte entries) and the new
size is 80 bytes (three 24-byte entries plus eight
1-byte indices).  That gives 58% compression.

Note, the sizeof(index) can be as small as a single
byte for small dicts, two bytes for bigger dicts and
up to sizeof(Py_ssize_t) for huge dict.

In addition to space savings, the new memory layout
makes iteration faster.  Currently, keys(), values, and
items() loop over the sparse table, skipping-over free
slots in the hash table.  Now, keys/values/items can
loop directly over the dense table, using fewer memory
accesses.

Another benefit is that resizing is faster and
touches fewer pieces of memory.  Currently, every
hash/key/value entry is moved or copied during a
resize.  In the new layout, only the indices are
updated.  For the most part, the hash/key/value entries
never move (except for an occasional swap to fill a
hole left by a deletion).

With the reduced memory footprint, we can also expect
better cache utilization.

For those wanting to experiment with the design,
there is a pure Python proof-of-concept here:

http://code.activestate.com/recipes/578375

YMMV: Keep in mind that the above size statics assume a
build with 64-bit Py_ssize_t and 64-bit pointers.  The
space savings percentages are a bit different on other
builds.  Also, note that in many applications, the size
of the data dominates the size of the container (i.e.
the weight of a bucket of water is mostly the water,
not the bucket).


What happens when a key is deleted? Will that leave an empty slot in
the entry array? If so, will the array be compacted if the proportion
of entries grows beyond a certain limit?

Adding a key would involve simply appending to the entry array (filling
the last empty slot), but if there could be empty slots in other
places, would it look for the first?

Actually, as I think about it, you could form a linked list (using the
'hash' field) of those empty slots that aren't part of the final
contiguous block and fill those preferentially.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-09 Thread PJ Eby
On Sun, Dec 9, 2012 at 8:48 PM, Stephen J. Turnbull  wrote:
> PJ Eby writes:
>  > This is a good example of what I meant about clear thinking on
>  > concrete use cases, vs. simply copying fields from distro tools.  In
>  > the distro world, these kinds of fields reflect the *results* of
>  > research and decision-making about compatibility.  Whereas, in our
>  > "upstream" world, the purpose of the fields is to provide downstream
>  > repackagers and integrators with the source materials for such
>  > research.
>
> I agree with the meaning of the above paragraph, but would like to
> dissociate myself from the comparison implied by the expression "clear
> thinking".

What comparison is that?

By "clear", I mean "free of prior assumptions".   The assumptions that
made the discussion difficult weren't just about the use cases
themselves, but about the environments, tools, organizations,
concepts, etc. surrounding those use cases.  Indeed, even the
assumption of what should *qualify* as a "use case" was a stumbling
block on occasion.  ;-)

And by "thinking", I mean, "considering alternatives and
consequences", as distinct from debating the merits of a specific
position.

Put together, the phrase "clear thinking on concrete use cases" means
(at least to me), "dropping all preconceptions of the existing design
and starting over from square one, to ask how best the problem may be
solved, using specific examples as a guide rather than using
generalities."  Generalities not rooted in concrete examples have a
way of leading to non-terminating discussions.  ;-)

Starting over a discussion in this fashion isn't easy, but the results
are usually worth it.  I appreciate Nick and Daniel's patience in
particular.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-09 Thread Stephen J. Turnbull
PJ Eby writes:

 > That being said, I don't object to having the ability for either of
 > them to do so: the utility of the field is *much* enhanced once its
 > connection to installation tools is gone, since a wider variety of
 > issues can be described without inconveniencing users.

+1 to "describing".  A metadata format should not specify tool
behavior, and should use behavior-neutral nomenclature.  Rather, use
cases that seem probable or perhaps wrong-headed should inform the
design.  Nevertheless, actual decisions about behavior should be left
to the tool authors.

 > This is a good example of what I meant about clear thinking on
 > concrete use cases, vs. simply copying fields from distro tools.  In
 > the distro world, these kinds of fields reflect the *results* of
 > research and decision-making about compatibility.  Whereas, in our
 > "upstream" world, the purpose of the fields is to provide downstream
 > repackagers and integrators with the source materials for such
 > research.

I agree with the meaning of the above paragraph, but would like to
dissociate myself from the comparison implied by the expression "clear
thinking".  AFAICS, it's different assumptions about use cases that
drives the difference in prescriptions here.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] More compact dictionaries with faster iteration

2012-12-09 Thread Raymond Hettinger
The current memory layout for dictionaries is
unnecessarily inefficient.  It has a sparse table of
24-byte entries containing the hash value, key pointer,
and value pointer.

Instead, the 24-byte entries should be stored in a
dense table referenced by a sparse table of indices.

For example, the dictionary:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as:

entries = [['--', '--', '--'],
   [-8522787127447073495, 'barry', 'green'],
   ['--', '--', '--'],
   ['--', '--', '--'],
   ['--', '--', '--'],
   [-9092791511155847987, 'timmy', 'red'],
   ['--', '--', '--'],
   [-6480567542315338377, 'guido', 'blue']]

Instead, the data should be organized as follows:

indices =  [None, 1, None, None, None, 0, None, 2]
entries =  [[-9092791511155847987, 'timmy', 'red'],
[-8522787127447073495, 'barry', 'green'],
[-6480567542315338377, 'guido', 'blue']]

Only the data layout needs to change.  The hash table
algorithms would stay the same.  All of the current
optimizations would be kept, including key-sharing
dicts and custom lookup functions for string-only
dicts.  There is no change to the hash functions, the
table search order, or collision statistics.

The memory savings are significant (from 30% to 95%
compression depending on the how full the table is).
Small dicts (size 0, 1, or 2) get the most benefit.

For a sparse table of size t with n entries, the sizes are:

curr_size = 24 * t
new_size = 24 * n + sizeof(index) * t

In the above timmy/barry/guido example, the current
size is 192 bytes (eight 24-byte entries) and the new
size is 80 bytes (three 24-byte entries plus eight
1-byte indices).  That gives 58% compression.

Note, the sizeof(index) can be as small as a single
byte for small dicts, two bytes for bigger dicts and
up to sizeof(Py_ssize_t) for huge dict.

In addition to space savings, the new memory layout
makes iteration faster.  Currently, keys(), values, and
items() loop over the sparse table, skipping-over free
slots in the hash table.  Now, keys/values/items can
loop directly over the dense table, using fewer memory
accesses.

Another benefit is that resizing is faster and 
touches fewer pieces of memory.  Currently, every
hash/key/value entry is moved or copied during a
resize.  In the new layout, only the indices are
updated.  For the most part, the hash/key/value entries
never move (except for an occasional swap to fill a
hole left by a deletion).

With the reduced memory footprint, we can also expect
better cache utilization.

For those wanting to experiment with the design,
there is a pure Python proof-of-concept here:

   http://code.activestate.com/recipes/578375

YMMV: Keep in mind that the above size statics assume a
build with 64-bit Py_ssize_t and 64-bit pointers.  The
space savings percentages are a bit different on other
builds.  Also, note that in many applications, the size
of the data dominates the size of the container (i.e.
the weight of a bucket of water is mostly the water,
not the bucket).


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] hg annotate is broken on hg.python.org

2012-12-09 Thread Chris Jerdonek
On Sun, Dec 9, 2012 at 3:30 PM, anatoly techtonik  wrote:
> Just to let you know that annotate in hgweb is broken for Python sources.
>
> http://hg.python.org/cpython/annotate/692be1f9fa1d/Lib/distutils/tests/test_register.py

Maybe I'm missing something, but what's broken about it?  Also, in my
experience it's okay to file issues about hg.python.org on the main
tracker if you suspect something isn't right or you think should be
improved.

--Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] hg annotate is broken on hg.python.org

2012-12-09 Thread anatoly techtonik
Just to let you know that annotate in hgweb is broken for Python sources.

http://hg.python.org/cpython/annotate/692be1f9fa1d/Lib/distutils/tests/test_register.py
-- 
anatoly t.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Do more at compile time; less at runtime

2012-12-09 Thread MRAB

On 2012-12-09 22:22, Mark Shannon wrote:

Hi all,

The current CPython bytecode interpreter is rather more complex than it
needs to be. A number of bytecodes could be eliminated and a few more
simplified by moving the work involved in handling compound statements
(loops, try-blocks, etc) from the interpreter to the compiler.

This simplest example of this is the while loop...
while cond:
 body

This currently compiled as

start:
 if not cond goto end
 body
 goto start
end:

but it could be compiled as

goto test:
start:
  body
  if cond goto start

which eliminates one instruction per iteration.

A more complex example is a return in a try-finally block.

try:
  part1
  if cond:
  return X
  part2
finally:
  part3

Currently, handling the return is complex and involves "pseudo
exceptions", but if part3 were duplicated by the compiler, then the
RETURN bytecode could just perform a simple return.
The code above would be compiled thus...

  PUSH_BLOCK try
  part1
  if not X goto endif
  push X
  POP_BLOCK
  part3   <<< duplicated
  RETURN_VALUE
endif:
  part2
  POP_BLOCK
  part3   <<< duplicated

The changes I am proposing are:


[snip]
Is it necessary to duplicate part3? Is it possible to put it into a
subroutine if it's long? (And I do mean a simple cheap subroutine.)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Do more at compile time; less at runtime

2012-12-09 Thread Nick Coghlan
Interesting idea, main challenge is to keep stepping through the code with
pdb sane, and being clear on what differences in behaviour will be visible
to the runtime execution hooks.

--
Sent from my phone, thus the relative brevity :)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Do more at compile time; less at runtime

2012-12-09 Thread Guido van Rossum
Sounds good to me. No PEP needed, just a tracker item, tests, review etc...

--Guido van Rossum (sent from Android phone)
On Dec 9, 2012 2:24 PM, "Mark Shannon"  wrote:

> Hi all,
>
> The current CPython bytecode interpreter is rather more complex than it
> needs to be. A number of bytecodes could be eliminated and a few more
> simplified by moving the work involved in handling compound statements
> (loops, try-blocks, etc) from the interpreter to the compiler.
>
> This simplest example of this is the while loop...
> while cond:
>body
>
> This currently compiled as
>
> start:
>if not cond goto end
>body
>goto start
> end:
>
> but it could be compiled as
>
> goto test:
> start:
> body
> if cond goto start
>
> which eliminates one instruction per iteration.
>
> A more complex example is a return in a try-finally block.
>
> try:
> part1
> if cond:
> return X
> part2
> finally:
> part3
>
> Currently, handling the return is complex and involves "pseudo
> exceptions", but if part3 were duplicated by the compiler, then the RETURN
> bytecode could just perform a simple return.
> The code above would be compiled thus...
>
> PUSH_BLOCK try
> part1
> if not X goto endif
> push X
> POP_BLOCK
> part3   <<< duplicated
> RETURN_VALUE
> endif:
> part2
> POP_BLOCK
> part3   <<< duplicated
>
> The changes I am proposing are:
>
> Allow negative line deltas in the lnotab array (bytecode deltas would
> remain non-negative)
> Remove the SETUP_LOOP, BREAK and CONTINUE bytecodes
> Simplify the RETURN bytecode
> Eliminate "pseudo exceptions" from the interpreter
> Simplify (or perhaps eliminate) SETUP_TRY, END_FINALLY, END_WITH.
> Reverse the sense of the FOR_ITER bytecode (ie. jump on not-exhausted)
>
>
> The net effect of these changes would be:
> Reduced code size and reduced code complexity.
> A small (1-5%)? increase in speed, due the simplification of the
> bytecodes and a very small change in the number of bytecodes executed.
> A small change in the static size of the bytecodes (-2% to +2%)?
>
> Although this is a quite intrusive change, I think it is worthwhile as it
> simplifies ceval.c considerably.
> The interpreter has become rather convoluted and any simplification has to
> be a good thing.
>
> I've already implemented negative line deltas and the transformed while
> loop: 
> https://bitbucket.org/**markshannon/cpython-lnotab-**signed
> I'm currently working on the block unwinding.
>
> So,
> Good idea? Bad idea?
> Should I write a PEP or is the bug tracker sufficient?
>
> Cheers,
> Mark.
>
>
>
>
>
>
>
> __**_
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/**mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/**
> guido%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Do more at compile time; less at runtime

2012-12-09 Thread Mark Shannon

Hi all,

The current CPython bytecode interpreter is rather more complex than it 
needs to be. A number of bytecodes could be eliminated and a few more 
simplified by moving the work involved in handling compound statements 
(loops, try-blocks, etc) from the interpreter to the compiler.


This simplest example of this is the while loop...
while cond:
   body

This currently compiled as

start:
   if not cond goto end
   body
   goto start
end:

but it could be compiled as

goto test:
start:
body
if cond goto start

which eliminates one instruction per iteration.

A more complex example is a return in a try-finally block.

try:
part1
if cond:
return X
part2
finally:
part3

Currently, handling the return is complex and involves "pseudo 
exceptions", but if part3 were duplicated by the compiler, then the 
RETURN bytecode could just perform a simple return.

The code above would be compiled thus...

PUSH_BLOCK try
part1
if not X goto endif
push X
POP_BLOCK
part3   <<< duplicated
RETURN_VALUE
endif:
part2
POP_BLOCK
part3   <<< duplicated

The changes I am proposing are:

Allow negative line deltas in the lnotab array (bytecode deltas would 
remain non-negative)

Remove the SETUP_LOOP, BREAK and CONTINUE bytecodes
Simplify the RETURN bytecode
Eliminate "pseudo exceptions" from the interpreter
Simplify (or perhaps eliminate) SETUP_TRY, END_FINALLY, END_WITH.
Reverse the sense of the FOR_ITER bytecode (ie. jump on not-exhausted)


The net effect of these changes would be:
Reduced code size and reduced code complexity.
A small (1-5%)? increase in speed, due the simplification of the
bytecodes and a very small change in the number of bytecodes executed.
A small change in the static size of the bytecodes (-2% to +2%)?

Although this is a quite intrusive change, I think it is worthwhile as 
it simplifies ceval.c considerably.
The interpreter has become rather convoluted and any simplification has 
to be a good thing.


I've already implemented negative line deltas and the transformed while 
loop: https://bitbucket.org/markshannon/cpython-lnotab-signed

I'm currently working on the block unwinding.

So,
Good idea? Bad idea?
Should I write a PEP or is the bug tracker sufficient?

Cheers,
Mark.







___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-09 Thread PJ Eby
On Sun, Dec 9, 2012 at 12:54 AM, Nick Coghlan  wrote:
> On Sun, Dec 9, 2012 at 6:18 AM, PJ Eby  wrote:
>>
>> On Sat, Dec 8, 2012 at 5:06 AM, Nick Coghlan  wrote:
>> > On Sat, Dec 8, 2012 at 4:46 PM, PJ Eby  wrote:
>> >>
>> >> So if package A includes a "Conflicts: B" declaration, I recommend the
>> >> following:
>> >>
>> >> * An attempt to install A with B already present refuses to install A
>> >> without a warning and confirmation
>> >> * An attempt to install B informs the user of the conflict, and
>> >> optionally offers to uninstall A
>> >>
>> >> In this way, any collateral damage to B is avoided, while still making
>> >> the intended "lack of support" declaration clear.
>> >>
>> >> How does that sound?
>> >
>> >
>> > No, that's not the way it works. A conflict is always symmetric, no
>> > matter
>> > who declares it.
>>
>> But that *precisely contradicts* what you said in your previous email:
>>
>> > It's to allow a project to say
>> > *they don't support* installing in parallel with another package.
>>
>> Just because A doesn't support being installed next to B, doesn't mean
>> B doesn't support being installed next to A.  B might work just fine
>> with A installed, and even be explicitly supported by the author of B.
>>  Why should the author of A get to decide what happens to B?  Just
>> because I trust A about A, doesn't mean I should have to trust them
>> about B.
>
>
> If I'm installing both A *and* B, I want to know if *either* project doesn't
> support that configuration. The order in which they get installed should
> *not* have any impact on my finding out that I am using one of my
> dependencies in an unsupported way that may cause me unanticipated problems
> further down the line.

This is probably moot now, but I didn't propose that installation
order matter -- in both scenarios I described, you end up with a
warning and A not installed, regardless of whether A or B were
installed first.

> The author of A *doesn't* get to decide what happens to B, *I* do.

The reason I said, "(de facto)", is because the default behavior of
whatever the next big installation tool is, would be what most users
would've gotten by default.

> They're
> merely providing a heads up that they believe there are problems when using
> their project in conjunction with B. My options will be:
> - use them both anyway (e.g. perhaps after doing some research, I may find
> out the conflict relates solely to a feature of B that I'm not using, so I
> simply update my project documentation to say "do not use feature X from
> project B, as it conflicts with dependency A")
> - choose to continue using A, find another solution for B
> - choose to continue using B, find another solution for A
>
> As a concrete example, there are projects out there that are known not to
> work with gevent's socket monkeypatching, but people don't know that until
> they try it and it blows up in their face.

Here's the question, though: who's going to maintain that list?

I can see gevent wanting to have a compatibility chart page in their
docs, but it seems unlikely they'd want to block installation of
non-gevent-compatible projects or vice versa.  Similarly, I can't see
why any of those other projects would want to block installation of
gevent, or vice versa.

That being said, I don't object to having the ability for either of
them to do so: the utility of the field is *much* enhanced once its
connection to installation tools is gone, since a wider variety of
issues can be described without inconveniencing users.


> I now agree that *enforcing* a conflicts field at install time in a Python
> installer doesn't make any sense, since the nature of Python means it will
> often be easy to sidestep any such issues once you're aware of their
> existence (e.g. by avoiding gevent's monkeypatching features and using
> threads to interact with the uncooperative synchronous library, or by
> splitting your application into multiple processes, some using gevent and
> others synchronous sockets). I also believe that *any* Conflicts declaration
> *should* be backed up with an explicit explanation and rationale for that
> conflict declaration in the project documentation.

Beyond that, I think a reference URL should be included *in the field
itself*, e.g. to a bug report, support ticket, or other page that
documents the incompatibility and will be updated as the situation
changes.  The actual usefulness of the field to anyone "downstream"
seems greatly reduced if they have to go hunting for the information
explaining the compatibility issue(s).

This is a good example of what I meant about clear thinking on
concrete use cases, vs. simply copying fields from distro tools.  In
the distro world, these kinds of fields reflect the *results* of
research and decision-making about compatibility.  Whereas, in our
"upstream" world, the purpose of the fields is to provide downstream
repackagers and integrators with the source materials for such
research.


> So, I still

Re: [Python-Dev] Proposing "Argument Clinic", a new way of specifying arguments to builtins for CPython

2012-12-09 Thread Georg Brandl
Am 04.12.2012 20:35, schrieb Antoine Pitrou:
> On Tue, 04 Dec 2012 11:04:09 -0800
> Larry Hastings  wrote:
>> 
>> Along these lines, I've been contemplating proposing that Clinic 
>> specifically understand "path" arguments, distinctly from other string 
>> arguments, as they are both common and rarely handled correctly.  My 
>> main fear is that I probably don't understand all their complexities 
>> either ;-)
>> 
>> Anyway, this is certainly something we can consider *improving* for 
>> Python 3.4.  But for now I'm trying to make Clinic an indistinguishable 
>> drop-in replacement.
>> 
> [...]
>> 
>> Naturally I agree Clinic needs more polishing.  But the problem you fear 
>> is already solved.  Clinic allows precisely expressing any existing 
>> PyArg_ "format unit"** through a combination of the type of the 
>> parameter and its "flags".
> 
> Very nice then! Your work is promising, and I hope we'll see a version
> of it some day in Python 3.4 (or 3.4+k).

Looks good to me to, and as someone who once tried to go the "preprocessor
macro" route, much saner.

One small thing: May I propose to make the "special comments" a little more
self-descriptive? Yes, "argument clinic" is a nice name for the whole thing,
but if you encounter it in a C file there's nothing it tells you about what
happens there.

cheers,
Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Emacs users: hg-tools-grep

2012-12-09 Thread Georg Brandl
Am 08.12.2012 22:51, schrieb Barry Warsaw:
> Hark fellow Emacsers.  All you unenlightened heathens can stop reading now.
> 
> A few years ago, my colleague Jono Lange wrote probably the best little chunk
> of Emacs lisp ever.  `M-x bzr-tools-grep` lets you easily search a Bazaar
> repository for a case-sensitive string, providing you with a nice *grep*
> buffer which you can scroll through.  When you find a code sample you want to
> look at, C-c C-c visits the file and plops you right at the matching line.
> You *only* grep through files under version control, so you get to ignore
> generated files, and compilation artifacts, etc.
> 
> Of course, this doesn't help you for working on the Python code base, because
> Mercurial.  I finally whipped up this straight up rip of Jono's code to work
> with hg.  I'm actually embarrassed to put a copyright on this thing, and would
> happily just donate it to Jono, drop it in Python's Misc directory, or slip it
> like a lump of coal into the xmas stocking of whoever wants to "maintain" it
> for the next 20 years.
> 
> But anyway, it's already proven enormously helpful to me, so here it is.

Thanks, I'll definitely use this!

Georg


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com