Anne Archibald <peridot.faceted <at> gmail.com> writes: > This was discussed on one of the mailing lists several months ago. It > turns out that there is no simple way to efficiently choose without > replacement in numpy/scipy.
That reassures me that I'm not missing something obvious! I'm pretty new with numpy (I've lurked here for a number of years, but never had a real-life need to use numpy until now). > I posted a hack that does this somewhat > efficiently (if SAMPLESIZE>M/2, choose the first SAMPLESIZE of a > permutation; if SAMPLESIZE<M/2, choose with replacement and redraw any > duplicates) but it's not vectorized across many sample sets. Is your > problem large M or large N? what is SAMPLESIZE/M? It's actually large SAMPLESIZE. As an example, I'm simulating repeated deals of poker hands from a deck of cards: M=52, N=5, SAMPLESIZE=1000000. For now, Robert's approach will work, but it will start blowing up when I want 100 million samples - I don't have the memory to hold all the data (4 bytes for an int * N=5 * 100000000 = 2GB plus change). So I'll need to allocate (say) 1 million at a time in a loop and accumulate my results. That's when 70- second costs to allocate start to hurt. (After all, this is just the setup - I've got my actual calculations to do as well!!!) I'll stick with Robert's approach for now, and see if I can knock up something using Cython once I really need the speed. Paul _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion