On 2016-02-08 00:05, Paulo da Silva wrote: > Às 22:17 de 07-02-2016, Tim Chase escreveu: >> all_files = list(generate_MyFile_objects()) >> interesting = [ >> (my_file1, my_file2) >> for i, my_file1 >> in enumerate(all_files, 1) >> for my_file2 >> in all_files[i:] >> if my_file1 == my_file2 >> ] > > "my_file1 == my_file2" can be implemented into MyFile class taking > advantage of caching sizes (if different files are different), > hashes or even content (for small files) or file headers (first n > bytes). However this seems to have a problem: > all_files: a b c d e ... > If a==b then comparing b with c,d,e is useless.
Depends on what the OP wants to have happen if more than one input file is equal. I.e., a == b == c. Does one just want "a has duplicates" (and optionally "and here's one of them"), or does one want "a == b", "a == c" and "b == c" in the output? > Another solution I thought of, could be defining some methods (I > still don't know which ones) in MyFile so that I could use sets > intersection. Would this one be a faster solution? Adding __hash__ would allow for the set operations, but would require (as ChrisA points out) knowing how to create a hash function that encompasses the information you want to compare. -tkc -- https://mail.python.org/mailman/listinfo/python-list