Hello,
Numpy provides efficient, vectorized methods for generating random samples
of an array with replacement. However, it lacks similar functionality for
sampling *without replacement* in a vectorized manner. To address this
limitation, I developed a function capable of performing this task,
achieving approximately a 30x performance improvement over a basic Python
loop for small sample sizes (and 2x performance improvement using numba).
Could this functionality, or something similar, be integrated into numpy?
See also this issue <https://github.com/numpy/numpy/issues/28084>.
Kind regards,
Mark
def random_choice_without_replacement(array, sample_size, n_iterations):
"""
Generates random samples from a given array without replacement.
Parameters
----------
array : array-like
Array from which to draw the random samples.
sample_size : int
Number of random samples to draw without replacement per iteration.
n_iterations : int
Number of iterations to generate random samples.
Returns
-------
random_samples : ndarray
The generated random samples.
Raises
------
ValueError
If sample_size is greater than the population size.
Examples
--------
Generate 10 random samples from np.arange(5) of size 3 without
replacement.
>>> array = np.arange(5)
>>> random_choice_without_replacement(array, 3, 10)
array([[4, 0, 1],
[1, 4, 0],
[1, 3, 2],
[0, 1, 3],
[1, 0, 2],
[3, 2, 4],
[0, 3, 1],
[1, 3, 4],
[3, 1, 4],
[0, 1, 3]]) # random
Generate 4 random samples from an n-dimensional array of size 3 without
replacement.
>>> array = np.arange(10).reshape(5, 2)
>>> random_choice_without_replacement(array, 3, 4)
array([[[0, 1],
[8, 9],
[4, 5]],
[[2, 3],
[8, 9],
[0, 1]],
[[0, 1],
[2, 3],
[8, 9]],
[[4, 5],
[2, 3],
[8, 9]]]) # random
"""
if sample_size > len(array):
raise ValueError(f"Sample_size ({sample_size}) is greater than the
population size ({len(array)}).")
indices = np.tile(np.arange(len(array)), (n_iterations,1))
random_samples = np.empty((n_iterations, sample_size), dtype=int)
rng = np.random.default_rng()
for i, int_max in zip(range(sample_size), reversed(range(len(array) -
sample_size, len(array)))):
random_indices = rng.integers(0, int_max + 1, size=(n_iterations,1))
random_samples[:, i] = np.take_along_axis(indices, random_indices,
axis=-1).T
np.put_along_axis(indices, random_indices, indices[:,
int_max:int_max+1], axis=-1)
return array[random_samples]
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: [email protected]