[issue46705] Memory optimization for set.issubset

2022-02-13 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

It will be even more efficient after applying a set.intersection() optimization 
in issue43464.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46705] Memory optimization for set.issubset

2022-02-11 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

See also issue18032.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46705] Memory optimization for set.issubset

2022-02-10 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

> Would not testing len(self.difference(other)) == 0 
> be more efficient?

Yes, I think so.

--
Added file: https://bugs.python.org/file50620/instrument_issubset.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46705] Memory optimization for set.issubset

2022-02-10 Thread Jack Nguyen


Change by Jack Nguyen :


--
keywords: +patch
pull_requests: +29432
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/31267

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46705] Memory optimization for set.issubset

2022-02-10 Thread Jack Nguyen


Jack Nguyen  added the comment:

As you say, which implementation performs better likely depends on the nature 
of the sets. I would suspect that using set.difference won't be substantially 
faster than using set.intersection in the best case, but it would be much 
slower if len(self) is much larger than len(self.intersection(other)) e.g. 
set(range(1_000_000)).issubset(range(10)). Overall I think that using 
set.intersection will have more well-rounded performance.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46705] Memory optimization for set.issubset

2022-02-10 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Would not testing len(self.difference(other)) == 0 be more efficient? Making a 
copy of a set and removing elements one by one may be faster than add elements 
one by one, because we only need to allocate a single chunk of memory for a set.

It depends on relative values of len(self), len(other) and len(set(other)). For 
example, the following code may be optimal in some cases:

tmp = set()
it = iter(other)
for item in it:
# if item in self: ?
tmp.add(item)
if len(tmp) >= len(self):
self = self.difference(tmp, it)
if not self:
return True
self.difference_update(other)
return not self
else:
return False  # len(self) > len(set(other))

The disadvantage of such optimizations is that they make the code much more 
bigger. The current code is small and simple, and good enough in most cases.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46705] Memory optimization for set.issubset

2022-02-09 Thread Raymond Hettinger


Change by Raymond Hettinger :


Added file: https://bugs.python.org/file50615/instrument_issubset.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46705] Memory optimization for set.issubset

2022-02-09 Thread Raymond Hettinger


Change by Raymond Hettinger :


Removed file: https://bugs.python.org/file50614/instrument_issubset.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46705] Memory optimization for set.issubset

2022-02-09 Thread Raymond Hettinger


Change by Raymond Hettinger :


Removed file: https://bugs.python.org/file50613/instrument_issubset.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46705] Memory optimization for set.issubset

2022-02-09 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

I've run a few more experiments and this looks like a net win more often than 
not.  Go ahead and submit a PR so we can evaluate it further.

--
Added file: https://bugs.python.org/file50614/instrument_issubset.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46705] Memory optimization for set.issubset

2022-02-09 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

We care more about the running speed than the memory usage.  Since sets only 
store pointers to data, they are typically smaller than the data they refer to.

I've attached some instrumentation code for running experiments to verify the 
workload under various scenarios.

--
Added file: https://bugs.python.org/file50613/instrument_issubset.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46705] Memory optimization for set.issubset

2022-02-09 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
versions:  -Python 3.10, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46705] Memory optimization for set.issubset

2022-02-09 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
assignee:  -> rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46705] Memory optimization for set.issubset

2022-02-09 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
nosy: +rhettinger, serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46705] Memory optimization for set.issubset

2022-02-09 Thread Jack Nguyen


New submission from Jack Nguyen :

I noticed that the set.issubset cpython implementation casts its iterable 
argument to a set. In some cases, casting the whole iterable to a set is 
unnecessary (see https://bugs.python.org/issue18032). Although the latter 
suggestion is to perform early termination, my suggestion is to use the 
intersection instead.

# PyAnySet_Check coming from the cpython source code.
def issubset(self, other):
# Intersection suggestion:
if not PyAnySet_Check(other):
return len(self.intersection(other)) == len(self)
# Usual implementation for sets.
else:
return ...

The main advantage that this implementation has is its memory performance, 
using only O(min(len(self), len(other))) memory, since it never stores elements 
it does not need.

I'm assuming that set construction costs O(n) set.__contains__ calls. This 
implementation uses len(other) calls to self.__contains__ and tmp.__contains__, 
where tmp = set(other). The current implementation uses len(self) + len(other) 
calls to tmp.__contains__.

Thus, I suspect the current implementation only has a chance at running 
noticeably faster when len(self) << len(other), where it performs fewer calls 
to set.__contains__. This is, however, also where the proposed implementation 
has significantly superior memory performance.

--
components: Interpreter Core
messages: 412966
nosy: panda1200
priority: normal
severity: normal
status: open
title: Memory optimization for set.issubset
type: performance
versions: Python 3.10, Python 3.11, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com