Thanks for all the comments. 
 I did consider using sets but order was important and when partitioning a list 
of general strings, not file names, there could be identical  and important 
strings (strings at different positions could mean different things, especially 
when generating test and training sets). 
I also tried doing it twice (generating lists) and it took approximately twice 
as long though generating two iterables could defer the partitioning till it 
was needed, i.e. lazy evaluation.

I’ve added some of the solutions to by timing test. Identified by capital 
letters at the end.

Reference list lengths 144004  855996
Number of tests cases 1000000
Example data ['xDy7AWbXau', 'TXlzsZV3Ra', 'YJh8uD9ovK', 'aRJ2U7nWs8', 
'geu.vHlogu']
FNmatch        0.671875 268756      0 268756        0
WCmatch        1.562500 264939      0 264939        0
IWCmatch       1.281250      0      0      0        1
Easy           0.093750 144004 855996 1000000        1
Re             0.328125 855996 144004 1000000        0
Positive       0.281250 144004      0 144004        1
Negative       0.328125      0 855996 855996        1
UppeeCase      0.375000 268756      0 268756        0
Null           0.000000      0      0      0        1
Both           0.328125 144004 855996 1000000        1
Partition      1.171875 855996 144004 1000000        0
IBoth          0.328125 855996 144004 1000000        0
MRAB           0.437500 144004 855996 1000000        1
METZ           0.500000 144004 855996 1000000        1
CA             0.343750 144004 855996 1000000        1
SJB            0.328125 144004 855996 1000000        1

I checked for order and interestingly all the set solutions preserve it. I 
think this is because there are no duplicates in the test data. Order is only 
checked correctly if thee are the same number of elements in the test and 
reference lists

I also tried more_itertoolls.partition. Nearly 4 times slower.
John
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6WFSIXUF67KU5QXOODCBG7RDDG2KSYSK/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to