[issue8685] set(range(100000)).difference(set()) is slow

2010-11-30 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Raymond, unless you object, I'd like to commit this before beta1.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-11-30 Thread Philip Jenvey

Changes by Philip Jenvey pjen...@underboss.org:


--
nosy: +pjenvey

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-11-30 Thread Raymond Hettinger

Raymond Hettinger rhettin...@users.sourceforge.net added the comment:

Thx.

--
assignee: rhettinger - pitrou
resolution:  - accepted

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-11-30 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Modified patch committed in r86905. Thanks!

--
resolution: accepted - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-09-01 Thread Jean-Paul Calderone

Jean-Paul Calderone inva...@example.invalid added the comment:

 I'll be looking at it shortly.  Py3.2 is still aways from release so there is 
 no hurry.

I would consider reviewing and possibly apply this change, but I don't want to 
invade anyone's territory.

--
nosy: +exarkun

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-09-01 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 I would consider reviewing and possibly apply this change, but I don't 
 want to invade anyone's territory.

I don't think there would be any invasion. I think the patch is simple enough, 
and seems to provide a nice benefit.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-09-01 Thread Raymond Hettinger

Raymond Hettinger rhettin...@users.sourceforge.net added the comment:

Please leave this for me.
Thank you.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-09-01 Thread Jean-Paul Calderone

Changes by Jean-Paul Calderone inva...@example.invalid:


--
nosy:  -exarkun

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-09-01 Thread Daniel Stutzbach

Changes by Daniel Stutzbach dan...@stutzbachenterprises.com:


--
nosy: +stutzbach

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-08-09 Thread Andrew Bennetts

Andrew Bennetts s...@users.sourceforge.net added the comment:

On 2010-05-17 rhettinger wrote:
 Will look at this when I get back to the U.S.

Ping!  This patch (set-difference-speedup-2.diff) has been sitting around for a 
fair few weeks now.  It's a small patch, so it should be relatively easy to 
review.  It makes a significant improvement to speed and memory in one case 
(which we have encountered and worked around in bzr), and has no significant 
downside to any other cases.

Thanks :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-08-09 Thread Florent Xicluna

Changes by Florent Xicluna florent.xicl...@gmail.com:


--
nosy: +flox

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-08-09 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

Andrew,

This issue is somewhat similar to issue8425.   I may be reading too much into 
the priority field, but it looks like Raymond would like to review #8425 
first.  You can help by commenting on how the two issues relate to each other.  
I believe the two are complementary, but I did not attempt to apply both 
patches.

(The patch still applies with little fuzz.)

--
stage:  - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-08-09 Thread Raymond Hettinger

Raymond Hettinger rhettin...@users.sourceforge.net added the comment:

I'll be looking at it shortly.  Py3.2 is still aways from release so there is 
no hurry.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-08-09 Thread Andrew Bennetts

Andrew Bennetts s...@users.sourceforge.net added the comment:

Alexander: yes, they are complementary.  My patch improves set.difference, 
which always creates a new set.  issue8425 on the other hand improves in-place 
difference (via the -= operator or set.difference_update).  Looking at the two 
patches, my patch will not improve in-place difference, and issue8425's patch 
will not improve set.difference.  So they are complementary.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-05-17 Thread Raymond Hettinger

Raymond Hettinger rhettin...@users.sourceforge.net added the comment:

Will look at this when I get back to the U.S.

--
priority: normal - low

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-05-16 Thread Andrew Bennetts

Andrew Bennetts s...@users.sourceforge.net added the comment:

Antoine: Thanks for the updated benchmark results!  I should have done that 
myself earlier.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-05-15 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

The current patch gives much smaller benefits than the originally posted 
benchmarks, although they are still substantial:

$ ./python -m timeit -s a = set(range(10)); sd = a.difference; b = 
set(range(1000)) sd(b)
- before: 5.56 msec per loop
- after: 3.18 msec per loop

$ ./python -m timeit -s a = set(range(100)); sd = a.difference; b = 
set(range(10)) sd(b)
- before: 67.9 msec per loop
- after: 41.8 msec per loop

--
versions:  -Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-05-12 Thread Andrew Bennetts

Changes by Andrew Bennetts s...@users.sourceforge.net:


Added file: http://bugs.python.org/file17306/set-mem.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-05-12 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 1. In constrained memory environments, creating a temporary internal
 copy of a large set may cause the difference operation to fail that
 would otherwise succeed.

It's a space/time tradeoff. There's nothing wrong about that.
(do note that hash tables themselves take much more space than the equivalent 
list)

 2. The break-even point between extra lookups and a copy is likely to
 be different on different systems or even on the same system under
 different loads.

So what? It's just a matter of choosing reasonable settings. There are other 
optimization heuristics in the interpreter.

The optimization here looks ok to me.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-05-12 Thread Andrew Bennetts

Andrew Bennetts s...@users.sourceforge.net added the comment:

Regarding memory, good question... but this patch turns out to be an 
improvement there too.

This optimisation only applies when len(x)  len(y) * 4.  So the minimum size 
of the result is a set with 3/4 of the elems of x (and possibly would be a full 
copy of x anyway).

So if you like this optimisation is simply taking advantage of the fact we're 
going to be copying almost all of these elements anyway.  We could make it less 
aggressive, but large sets are tuned to be between 1/2 and 1/3 empty internally 
anyway, so 1/4 overhead seems reasonable.

Also, because this code immediately makes the result set be about the right 
size, rather than growing it one element at a time, the memory consumption is 
actually *better*.  I'll attach a script that demonstrates this; for me it 
shows that large_set.difference(small_set) [where large_set has 4M elems, 
small_set has 100] peaks at 50MB memory consumption without my patch, but only 
18MB with.  (after discounting the memory required for large_set itself, etc.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-05-11 Thread Andrew Bennetts

New submission from Andrew Bennetts s...@users.sourceforge.net:

set.difference(s), when s is also a set, basically does::

res = set()
for elem in self:
if elem not in other:
res.add(elem)

This is wasteful when len(self) is much greater than len(other):

$ python -m timeit -s s = set(range(10)); sd = s.difference; empty = 
set() sd(empty)
100 loops, best of 3: 12.8 msec per loop
$ python -m timeit -s s = set(range(10)); sd = s.difference; empty = set() 
sd(empty)
100 loops, best of 3: 1.18 usec per loop

Here's a patch that compares the lengths of self and other before that loop, 
and if len(self) is greater, swaps them.  The new timeit results are:

$ python -m timeit -s s = set(range(10)); sd = s.difference; empty = 
set() sd(empty)
100 loops, best of 3: 0.289 usec per loop
$ python -m timeit -s s = set(range(10)); sd = s.difference; empty = set() 
sd(empty)
100 loops, best of 3: 0.294 usec per loop

--
components: Interpreter Core
files: set-difference-speedup.diff
keywords: patch
messages: 105489
nosy: spiv
priority: normal
severity: normal
status: open
title: set(range(10)).difference(set()) is slow
type: performance
versions: Python 2.7
Added file: http://bugs.python.org/file17292/set-difference-speedup.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-05-11 Thread Andrew Bennetts

Andrew Bennetts s...@users.sourceforge.net added the comment:

Oops, obvious bug in this patch. set('abc') - set('bcd') != set('bcd') - 
set('abc').  I'll see if I can make a more sensible improvement.

See also http://bugs.python.org/issue8425.  Thanks dickinsm on #python-dev.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-05-11 Thread Mark Dickinson

Changes by Mark Dickinson dicki...@gmail.com:


--
assignee:  - rhettinger
nosy: +rhettinger
versions: +Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-05-11 Thread Andrew Bennetts

Andrew Bennetts s...@users.sourceforge.net added the comment:

Ok, this time test_set* passes :)

Currently if you have large set and small set the code will do len(large) 
lookups in the small set.  When large is  than small, it is cheaper to copy 
large and do len(small) lookups in large.  On my laptop a size difference of 4 
times is a clear winner for copy+difference_update over the status quo, even 
for sets of millions of entries.  For more similarly sized sets (even only 
factor of 2 size difference) the cost of allocating a large set that is likely 
to be shrunk significantly is greater than the benefit.  So my patch only 
switches behaviour for len(x)/4  len(y).

This patch is complementary to the patch in issue8425, I think.

--
Added file: http://bugs.python.org/file17293/set-difference-speedup-2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-05-11 Thread Alexander Belopolsky

Changes by Alexander Belopolsky belopol...@users.sourceforge.net:


--
nosy: +belopolsky

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8685] set(range(100000)).difference(set()) is slow

2010-05-11 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

I have two problems with this proposal:

1. In constrained memory environments, creating a temporary internal copy of a 
large set may cause the difference operation to fail that would otherwise 
succeed.

2. The break-even point between extra lookups and a copy is likely to be 
different on different systems or even on the same system under different loads.

Programs that suffer from poor large_set.difference(small_set) performance can 
be rewritten as large_set_copy = large_set.copy(); 
large_set_copy.difference_updste(small_set) or even simply as 
large_set.difference_updste(small_set) if program logic allows it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com