[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Chris Angelico
On Tue, 7 Jun 2022 at 10:26, Steven D'Aprano  wrote:
>
> Why don't you use the version from the itertools recipes?
>
>
> ```
> from itertools import tee, filterfalse
> def partition(pred, iterable):
> "Use a predicate to partition entries into false entries and true entries"
> # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
> t1, t2 = tee(iterable)
> return filterfalse(pred, t1), filter(pred, t2)
> ```
>
>

Calls the predicate twice for every element, so it's only good for a
guaranteed-deterministic predicate. You may want to read the thread
before responding.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NS6ORDJ4STDGOJID5B5NQXZ4MVC5QORQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Steven D'Aprano
Why don't you use the version from the itertools recipes?


```
from itertools import tee, filterfalse
def partition(pred, iterable):
"Use a predicate to partition entries into false entries and true entries"
# partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
```


-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TOQKBCZNXZRH5BVFPPFOPJTVPA3KOZ54/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Steven D'Aprano
On Mon, Jun 06, 2022 at 06:17:32PM +0200, Benedict Verhegghe wrote:

> There still is something wrong. I get the second list twice:
> 
> odd, even = partition(lambda i: i % 2, range(20))
> print(list(odd))
> [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
> print(list(even))
> [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Confirmed. But if you replace range(20) with `iter(range(20))`, it 
works correctly.

The plot thickens. The duplicated list is not from the second list 
created, but the second list *evaluated*. So if you run:

odd, even = partition(...)

as before, but evaluate *even* first and odd second:

print(list(even))
print(list(odd))

it is odd that is doubled, not even.


-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OJIA6HIH6UJYE7X7LU7W46QI2X6UPTK6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Add .except() and .only(), and .values_at(). instance methods to dict

2022-06-06 Thread Christopher Barker
This idea reminds me of numpy fancy indexing: that is, you can pass
multiple indexes in at once. (See bumpy docs for details). I have no idea
how that might translate to ducts (tulles are perfectly reasonable keys
already) but I know I like it in numpy :-)

That being said, I can’t remember a use case where I would have used it
with dicts.

-CHB



On Mon, Jun 6, 2022 at 11:04 AM Chris Angelico  wrote:

> On Mon, 6 Jun 2022 at 18:02, Stephen J. Turnbull
>  wrote:
> >
> > David Mertz, Ph.D. writes:
> >
> >  > These are all far too easy to do with comprehensions to merit new
> methods
> >  > or stdlib functions.
> >
> > +1
> >
> >  > E.g., we might provide additional set-like operators for dicts.
> >
> >  > >>> m - {'a'}  # Would rightval be a set or dict though?!
> >  > >>> m & {'a', 'b'}  # Same question, but set feels better, I think
> >
> > Why not "either"?  Of course if you allow dicts, the question in both
> > cases becomes whether the presence test is on keys or items.  I think
> > keys would be more frequently useful.
> >
>
> Or even: Any iterable that yields keys? And then the presence test
> would be defined entirely by keys (which makes the most sense). You
> could subtract another dict, or a set, list, or anything.
>
> (You could even subtract the string "ab" to remove keys "a" and "b",
> for what it's worth.)
>
> Of note: https://peps.python.org/pep-0584/#what-about-the-full-set-api
>
> """This PEP does not take a position on whether dicts should support
> the full collection of set operators, and would prefer to leave that
> for a later PEP (one of the authors is interested in drafting such a
> PEP)."""
>
> So if you (or anyone) wants to move forward with this, I would
> recommend (a) reading the notes in PEP 584, (b) reaching out to all
> three authors of it to find out their current sentiment, and (c)
> designing definitive semantics for the operators in question. I'd be
> happy to help with the mechanics of getting a PEP written.
>
> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/ETCU6O7QGERUFQND5XRBLGHV5OT4CNT3/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZYQO7PE2SFTUC5YZB55GPS2HFKOOU7WG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Chris Angelico
On Tue, 7 Jun 2022 at 01:58, David Mertz, Ph.D.  wrote:
> In [2]: def isEven(n):
>...: return n/2 == n//2
>...:
>

Huh. That's a very strange way to write that predicate.

>>> isEven(2)
True
>>> isEven(20001)
True
>>> isEven(20002)
False
>>> isEven(20003)
False
>>> isEven(20004)
True

Why not just "return n % 2 == 0" ?

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HDNXSZI24RIMJDHWOMPIWQXN4EZUA4Y3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Benedict Verhegghe




Op 6/06/2022 om 18:07 schreef Mathew Elman:

oops, you're right, I have ended up with an unnecessary "not" in there, should 
be:

from collections import deque


def partition(pred, iterable):
 results = deque([]), deque([])

 def gen_split(only):
 for thing in iterable:
 if results[only]:
 yield results[only].popleft()
 results[pred(thing)].append(thing)
 for thing in results[only]:
 yield thing

 return gen_split(True), gen_split(False)


There still is something wrong. I get the second list twice:

odd, even = partition(lambda i: i % 2, range(20))
print(list(odd))
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(list(even))
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JYGSSXPN6FLGHVJPUM4XTPH2EABZZMLS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Mathew Elman
oops, you're right, I have ended up with an unnecessary "not" in there, should 
be:

from collections import deque


def partition(pred, iterable):
results = deque([]), deque([])

def gen_split(only):
for thing in iterable:
if results[only]:
yield results[only].popleft()
results[pred(thing)].append(thing)
for thing in results[only]:
yield thing

return gen_split(True), gen_split(False)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QURICBHMB3D2WG2VWEBJ56Q6ZZHKBUJX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread David Mertz, Ph.D.
This is really nice, but I think you have a negation wrong in there
somewhere:

In [1]: from collections import deque
   ...:
   ...: def partition(pred, iterable):
   ...: results = deque([]), deque([])
   ...:
   ...: def gen_split(only):
   ...: for thing in iterable:
   ...: if results[only]:
   ...: yield results[only].popleft()
   ...: results[not pred(thing)].append(thing)
   ...: for thing in results[only]:
   ...: yield thing
   ...:
   ...: return gen_split(True), gen_split(False)
   ...:

In [2]: def isEven(n):
   ...: return n/2 == n//2
   ...:

In [3]: from itertools import count

In [4]: evens, odds = partition(isEven, count())

In [5]: next(evens)
Out[5]: 1

In [6]: next(evens)
Out[6]: 3

In [7]: next(evens)
Out[7]: 5

In [8]: next(odds)
Out[8]: 0

In [9]: next(odds)
Out[9]: 2

In [10]: next(odds)
Out[10]: 4

On Mon, Jun 6, 2022 at 10:32 AM Mathew Elman  wrote:

> from collections import deque
>
> def partition(pred, iterable):
> results = deque([]), deque([])
>
> def gen_split(only):
> for thing in iterable:
> if results[only]:
> yield results[only].popleft()
> results[not pred(thing)].append(thing)
> for thing in results[only]:
> yield thing
>
> return gen_split(True), gen_split(False)
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/EUAYPXMZM4DOGTAAUURNOMWTDU65K6UY/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HWKJMLZSQVZQERYAKYZUAMK2YGIFUPIX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread John Carter
Thanks for all the comments. 
 I did consider using sets but order was important and when partitioning a list 
of general strings, not file names, there could be identical  and important 
strings (strings at different positions could mean different things, especially 
when generating test and training sets). 
I also tried doing it twice (generating lists) and it took approximately twice 
as long though generating two iterables could defer the partitioning till it 
was needed, i.e. lazy evaluation.

I’ve added some of the solutions to by timing test. Identified by capital 
letters at the end.

Reference list lengths 144004  855996
Number of tests cases 100
Example data ['xDy7AWbXau', 'TXlzsZV3Ra', 'YJh8uD9ovK', 'aRJ2U7nWs8', 
'geu.vHlogu']
FNmatch0.671875 268756  0 2687560
WCmatch1.562500 264939  0 2649390
IWCmatch   1.281250  0  0  01
Easy   0.093750 144004 855996 1001
Re 0.328125 855996 144004 1000
Positive   0.281250 144004  0 1440041
Negative   0.328125  0 855996 8559961
UppeeCase  0.375000 268756  0 2687560
Null   0.00  0  0  01
Both   0.328125 144004 855996 1001
Partition  1.171875 855996 144004 1000
IBoth  0.328125 855996 144004 1000
MRAB   0.437500 144004 855996 1001
METZ   0.50 144004 855996 1001
CA 0.343750 144004 855996 1001
SJB0.328125 144004 855996 1001

I checked for order and interestingly all the set solutions preserve it. I 
think this is because there are no duplicates in the test data. Order is only 
checked correctly if thee are the same number of elements in the test and 
reference lists

I also tried more_itertoolls.partition. Nearly 4 times slower.
John
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6WFSIXUF67KU5QXOODCBG7RDDG2KSYSK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Mathew Elman
from collections import deque

def partition(pred, iterable):
results = deque([]), deque([])

def gen_split(only):
for thing in iterable:
if results[only]:
yield results[only].popleft()
results[not pred(thing)].append(thing)
for thing in results[only]:
yield thing

return gen_split(True), gen_split(False)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EUAYPXMZM4DOGTAAUURNOMWTDU65K6UY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread David Mertz, Ph.D.
As Chris Angelico notes, that's only good for finite iterables.

The recipe in the itertools docs is unbounded, but it's also unbounded in
the cache needed if you get long strings of things the do (or don't)
confirm the predicate.

On Mon, Jun 6, 2022 at 7:06 AM Stephen J. Turnbull <
stephenjturnb...@gmail.com> wrote:

> Christopher Barker writes:
>
>  > How do you divide a sequence into two sequences cleanly?
>
> I don't know how to do it in a one liner, but
>
> yes = []
> no = []
> for elem in seq:
> (yes if test(elem) else no).append(elem)
>
> seems pretty clean to me.
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/BBVBDT7KJTTOD5VQYIHBCQBU5QV32ZUB/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4RIL6JUJXJDF3G6HWVU4DD33TKPEZYCI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Stephen J. Turnbull
Christopher Barker writes:

 > How do you divide a sequence into two sequences cleanly?

I don't know how to do it in a one liner, but

yes = []
no = []
for elem in seq:
(yes if test(elem) else no).append(elem)

seems pretty clean to me.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BBVBDT7KJTTOD5VQYIHBCQBU5QV32ZUB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Chris Angelico
On Mon, 6 Jun 2022 at 18:22, Paul Moore  wrote:
>
> There’s an itertools recipe, “partition”.
>

Yes, there is (and plenty of equivalents in other languages). But it's
still not what I'd call "particularly easy". What that recipe does is
filter the same iterable with the same predicate twice, so it depends
on the predicate being repeatable. You can't use it, for instance, to
split a data set into training and test sets, even though conceptually
this should work:

training, test = partition(lambda x: random.random() < 0.9, dataset)

To truly perform a one-pass partitioning operation, you'd have to do
something like this:

def partition(pred, iter):
results = [], []
for thing in iter:
results[not pred(thing)].append(thing)
return results

Obviously that works only on finite iterables, so it's not appropriate
for an itertools recipe, but it does guarantee that the predicate is
called once per item and each item ends up in precisely one list.

It would be kinda nice to be able to do something like this with a
comprehension, since there's a fair bit of overhead to calling a
function (predicate functions have to be called for each element, but
a comprehension wraps everything up into one function). But since it's
not, the above is good enough for the cases where it's needed - not
particularly easy, but not all that hard either.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BUFIMKKIJOPJQQ4WT2VLNXTTLAPQGCIN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Add .except() and .only(), and .values_at(). instance methods to dict

2022-06-06 Thread Chris Angelico
On Mon, 6 Jun 2022 at 18:02, Stephen J. Turnbull
 wrote:
>
> David Mertz, Ph.D. writes:
>
>  > These are all far too easy to do with comprehensions to merit new methods
>  > or stdlib functions.
>
> +1
>
>  > E.g., we might provide additional set-like operators for dicts.
>
>  > >>> m - {'a'}  # Would rightval be a set or dict though?!
>  > >>> m & {'a', 'b'}  # Same question, but set feels better, I think
>
> Why not "either"?  Of course if you allow dicts, the question in both
> cases becomes whether the presence test is on keys or items.  I think
> keys would be more frequently useful.
>

Or even: Any iterable that yields keys? And then the presence test
would be defined entirely by keys (which makes the most sense). You
could subtract another dict, or a set, list, or anything.

(You could even subtract the string "ab" to remove keys "a" and "b",
for what it's worth.)

Of note: https://peps.python.org/pep-0584/#what-about-the-full-set-api

"""This PEP does not take a position on whether dicts should support
the full collection of set operators, and would prefer to leave that
for a later PEP (one of the authors is interested in drafting such a
PEP)."""

So if you (or anyone) wants to move forward with this, I would
recommend (a) reading the notes in PEP 584, (b) reaching out to all
three authors of it to find out their current sentiment, and (c)
designing definitive semantics for the operators in question. I'd be
happy to help with the mechanics of getting a PEP written.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ETCU6O7QGERUFQND5XRBLGHV5OT4CNT3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Efficiency of set difference

2022-06-06 Thread Serhiy Storchaka

06.06.22 10:51, Stephen J. Turnbull пише:

Couldn't it be linear on min(len(left), len(right))?  Ie,

 if len(left) < len(right):
 for elem in left:
 if elem in right:
left.discard(elem)
 else:
 for elem in right:
 left.discard(elem)

The upper loop is probably only about half as fast as the lower (two
hashes needed), so maybe the length test could be tuned.

The whole idea is probably not worth it, I can't think of practical
use cases at a scale where it would matter.


It is close to the current implementation:

 if len(left) < len(right)//8:
 right = left & right
 for elem in left:
 if elem in right:
left.discard(elem)

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OGOMGS4353C34REIWQWWUNPF6LAX655C/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Paul Moore
There’s an itertools recipe, “partition”.

On Mon, 6 Jun 2022 at 08:28, Chris Angelico  wrote:

> On Mon, 6 Jun 2022 at 16:42, Christopher Barker 
> wrote:
> >
> > I think y’all have addressed the OP’s problem — at least the performance
> issues.
> >
> > But I think this brings up a pattern that I don’t have a nifty way to
> address:
> >
> > How do you divide a sequence into two sequences cleanly?
> >
> > The filter pattern: selectively remove items from a sequence, returning
> a new sequence. There are a few ways to do that in Python.
> >
> > But what if you need both the remaining items and the removed ones? Easy
> enough to write a loop that populates two lists, but is there a nifty
> one-liner?
> >
> > Is this a known functional pattern?
> >
>
> I'd call that a "partitioning" task, and you're right, it's not
> particularly easy in Python.
>
> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/NEY4JKQMDTMS6PN3I6VK6FCICRLCG43P/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7TFCBUT6UD67UVYNUUTKUP26A5FGSUDO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Add .except() and .only(), and .values_at(). instance methods to dict

2022-06-06 Thread Stephen J. Turnbull
David Mertz, Ph.D. writes:

 > These are all far too easy to do with comprehensions to merit new methods
 > or stdlib functions.

+1

 > E.g., we might provide additional set-like operators for dicts.

 > >>> m - {'a'}  # Would rightval be a set or dict though?!
 > >>> m & {'a', 'b'}  # Same question, but set feels better, I think

Why not "either"?  Of course if you allow dicts, the question in both
cases becomes whether the presence test is on keys or items.  I think
keys would be more frequently useful.

Steve

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/B5JREJ6C3QBF6WMKG3FW6YYUDPFGGH46/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Efficiency of set difference

2022-06-06 Thread Stephen J. Turnbull
David Mertz, Ph.D. writes:

 > This is exactly the problem solved by set difference. E.g.
 > `{a, b, c} - {a, c}`.
 > 
 > This operation is linear on the size of the removed set.

Couldn't it be linear on min(len(left), len(right))?  Ie,

if len(left) < len(right):
for elem in left:
if elem in right:
   left.discard(elem)
else:
for elem in right:
left.discard(elem)

The upper loop is probably only about half as fast as the lower (two
hashes needed), so maybe the length test could be tuned.

The whole idea is probably not worth it, I can't think of practical
use cases at a scale where it would matter.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MOZZOH4ZMRKJ6X2BHY2RYMU7IHMJEFOF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Chris Angelico
On Mon, 6 Jun 2022 at 16:42, Christopher Barker  wrote:
>
> I think y’all have addressed the OP’s problem — at least the performance 
> issues.
>
> But I think this brings up a pattern that I don’t have a nifty way to address:
>
> How do you divide a sequence into two sequences cleanly?
>
> The filter pattern: selectively remove items from a sequence, returning a new 
> sequence. There are a few ways to do that in Python.
>
> But what if you need both the remaining items and the removed ones? Easy 
> enough to write a loop that populates two lists, but is there a nifty 
> one-liner?
>
> Is this a known functional pattern?
>

I'd call that a "partitioning" task, and you're right, it's not
particularly easy in Python.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NEY4JKQMDTMS6PN3I6VK6FCICRLCG43P/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Christopher Barker
I think y’all have addressed the OP’s problem — at least the performance
issues.

But I think this brings up a pattern that I don’t have a nifty way to
address:

How do you divide a sequence into two sequences cleanly?

The filter pattern: selectively remove items from a sequence, returning a
new sequence. There are a few ways to do that in Python.

But what if you need both the remaining items and the removed ones? Easy
enough to write a loop that populates two lists, but is there a nifty
one-liner?

Is this a known functional pattern?

-CHB




On Sun, Jun 5, 2022 at 7:39 PM Eric V. Smith via Python-ideas <
python-ideas@python.org> wrote:

>
> On 6/5/2022 1:22 PM, Benedict Verhegghe wrote:
> > Op 5/06/2022 om 18:47 schreef David Mertz, Ph.D.:
> >> Sure, that's nice enough code and has the same big-O complexity. I
> >> suspect set difference is a little faster (by a constant multiple)
> >> because it hits C code more, but I haven't benchmarked.
> >>
> >> The OP said the elements were from fnmatch though, which explicitly
> >> does not promise order. So really it's just whether you like your
> >> code or this better aesthetically:
> >>
> >>  list(set(b) - set(a))
> >>
> > I benchmarked it and indeed the list difference is a little faster.
> > >>> timeit.timeit('s=set(b); [x for x in a if x not in s]', setup='a =
> > list(range(1)); b = list(range(1000))', number=1000)
> > 0.6156670850032242
> > >>> timeit.timeit('list(set(a)-set(b))', setup='a =
> > list(range(1)); b = list(range(1000))', number=1000)
> > 0.43649216600169893
> >
> > And what's more, it seems to also preserve order. I guess because a
> > set is implemented like a dict, and will preserve order of you only
> > remove some elements from the set.
> >
> A set doesn't guarantee order at all, and indeed it does not preserve
> creation order:
>
>  >>> s = set([10, 20, 1])
>  >>> s
> {1, 10, 20}
>
> You're seeing an accidental byproduct of the implementation.
>
> Eric
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/WZFC3WAOMTQ2UQIMGBJ2VTBBOE5XUWBR/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/P2OTU27MUOV2SY2TUVFCQ47R4KBCMPFF/
Code of Conduct: http://python.org/psf/codeofconduct/