Thanks for all the comments. I did consider using sets but order was important and when partitioning a list of general strings, not file names, there could be identical and important strings (strings at different positions could mean different things, especially when generating test and training sets). I also tried doing it twice (generating lists) and it took approximately twice as long though generating two iterables could defer the partitioning till it was needed, i.e. lazy evaluation.
I’ve added some of the solutions to by timing test. Identified by capital letters at the end. Reference list lengths 144004 855996 Number of tests cases 1000000 Example data ['xDy7AWbXau', 'TXlzsZV3Ra', 'YJh8uD9ovK', 'aRJ2U7nWs8', 'geu.vHlogu'] FNmatch 0.671875 268756 0 268756 0 WCmatch 1.562500 264939 0 264939 0 IWCmatch 1.281250 0 0 0 1 Easy 0.093750 144004 855996 1000000 1 Re 0.328125 855996 144004 1000000 0 Positive 0.281250 144004 0 144004 1 Negative 0.328125 0 855996 855996 1 UppeeCase 0.375000 268756 0 268756 0 Null 0.000000 0 0 0 1 Both 0.328125 144004 855996 1000000 1 Partition 1.171875 855996 144004 1000000 0 IBoth 0.328125 855996 144004 1000000 0 MRAB 0.437500 144004 855996 1000000 1 METZ 0.500000 144004 855996 1000000 1 CA 0.343750 144004 855996 1000000 1 SJB 0.328125 144004 855996 1000000 1 I checked for order and interestingly all the set solutions preserve it. I think this is because there are no duplicates in the test data. Order is only checked correctly if thee are the same number of elements in the test and reference lists I also tried more_itertoolls.partition. Nearly 4 times slower. John _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/6WFSIXUF67KU5QXOODCBG7RDDG2KSYSK/ Code of Conduct: http://python.org/psf/codeofconduct/