Andrew Barnert wrote: > On Sep 23, 2019, at 15:32, Richard Higginbotham higgi...@gmail.com wrote: > > Considering > > your use case however, I wonder, if you would not be better > > going with the iterator approach (as Andrew has been hinting already for > > some time in his posts). > > Richard M. > > They will most likely have good performance on data sets that are close, > > but very poor on > > those that aren't balanced. > > The code you posted doesn’t do any galloping, it just advances the index > one at a time. The fact that you could do approximate binary searches to > optimize it > doesn’t help if you never do those searches. So it’ll be exactly as poor on > unbalanced Yes it probably will, but we won't be able to add binary searches using iterators. In which case, why use iterators... It's a pretty basic optimization step.
> inputs as my Iterator code. After all, they’re exactly the same algorithm > (except for > trivial differences in how they handle one input or the other ending early); > the only > difference is that one implementation calls next and remembers the value, the > other > increases an index and indexes a list. I'm not bragging about my code. I think its neat, but I wrote it 20 years ago after graduating from college with a chemical engineering degree. It was my second python program and like 3rd or 4th program outside re-implementing the C library (as is common with students). > Of course if you always have lists already in memory, the added overhead of > using an > Iterator over the list is cost for no benefit. But that’s a fixed multiplier, > it’s not > different for balanced vs. unbalanced inputs. And if course if the Iterator > ever allows > you to avoid building an otherwise-unnecessary list, it’s a win, but that win > is also a > flat cost that depends only on the allocation time for N elements, which also > doesn’t vary > based on balanced vs. unbalanced inputs. > Anyway, if your use case really is a gigantic directory from a filesystem that > guarantees sorted order and it supports iterdir, iterators should help quite > a bit. In the spirit of comradery, I'll try to follow Steven D'Aprano's community advice and be a little more blunt. I'll skip the insults though. I wrote them but I took them out. Honestly, I could have done this at any time I just didn't want to hurt peoples feelings. I already converted it to C along time ago and did many many test I don't want to repeat. I know how it scales. We don't even have to convert to C. Let's just use pypy. For grins we'll change to an integer hash. For A not in B: trying A 1000, B 5000 best relcomp_set test time (msec): 0.009 best relcomp_set with set init test time (msec): 0.130 best relcomp_set with set init intersect test time (msec): 0.115 best list_relcomp test time (msec): 0.012 best sort test time (msec): 0.006 For A not in B: trying A 10000, B 50000 best relcomp_set test time (msec): 0.117 best relcomp_set with set init test time (msec): 2.448 best relcomp_set with set init intersect test time (msec): 2.649 best list_relcomp test time (msec): 0.077 best sort test time (msec): 0.048 For A not in B: trying A 1000000, B 5000000 best relcomp_set test time (msec): 82.131 best relcomp_set with set init test time (msec): 880.753 best relcomp_set with set init intersect test time (msec): 886.761 best list_relcomp test time (msec): 7.902 best sort test time (msec): 5.826 For A not in B: trying A 5000000, B 10000000 best relcomp_set test time (msec): 557.368 best relcomp_set with set init test time (msec): 3228.475 best relcomp_set with set init intersect test time (msec): 3725.047 best list_relcomp test time (msec): 25.626 best sort test time (msec): 13.632 Huh? At about 1000 elements they are equal but after that my 2nd python program is 10x faster, oops. Lets try more iterations "with a real Benchmark program". I won't repeat mine though well just take the first number that comes by. trying A 1000, B 5000 best relcomp_set test time (msec): 0.009 best relcomp_set with set init test time (msec): 0.116 best relcomp_set with set init intersect test time (msec): 0.118 best list_relcomp test time (msec): 0.015 best sort test time (msec): 0.006 For A not in B: trying A 10000, B 50000 best relcomp_set test time (msec): 0.115 best relcomp_set with set init test time (msec): 2.729 best relcomp_set with set init intersect test time (msec): 2.780 best list_relcomp test time (msec): 0.111 best sort test time (msec): 0.049 You don't want to see data sets higher than that. Ok, a little sarcasm but you earned it. All you had to do was not be jerks. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Q7RREZTQVDILSVTO6YUFJDUA5LFK6V77/ Code of Conduct: http://python.org/psf/codeofconduct/