Hello,
Thank you for your feedback. I attached a patch, that addresses most of your
points.
I'll look into it. It would help if the patch could include a version
number at the end.
Should the exchange be skipped when i == k?
The additional branch is actually slower (on my machine, tested with an 10M
element int array) for 1D-arrays, which i assume is the most common case.
Even with a 2D-array with a subarray size of 1M there is no difference in
execution time. i == k should be relatively rare for large arrays, given a
good prng.
Ok, slower is bad.
The random and array shuffling functions share a common state. I'm
wondering whether it should ot should not be so. It seems ok, but then ISTM
that the documentation suggests implicitely that setseed applies to
random() only, which is not the case anymore after the patch.
I really like the idea of a prng state owned by the user, that is used by all
user facing random functions, but see that their might be concerns about this
change. I would update the setseed() documentation, if this proposal is
accepted. Do you think my patch should already include the documentation
update?
Duno. I'm still wondering what it should do. I'm pretty sure that the
documentation should be clear about a shared seed, if any. I do not think
that departing from the standard is a good thing, either.
If more samples are required than the number of elements, it does not error
out. I'm wondering whether it should.
The second argument to array_sample() is treated like a LIMIT, rather than
the actual number of elements. I updated the documentation. My use-case is:
take max random items from an array of unknown size.
Hmmm. ISTM that the use case of wanting "this many" stuff would make more
sense because it is strictier so less likely to result in unforseen
consequences. On principle I do not like not knowing the output size.
If someone wants a limit, they can easily "LEAST(#1 dim size, other
limit)" to get it, it is easy enough with a strict function.
Also, the sampling should not return its result in order when the number of
elements required is the full array, ISTM that it should be shuffled there
as well.
You are the second person asking for this. It's done. I am thinking about
ditching array_sample() and replace it with a second signature for
array_shuffle():
array_shuffle(array anyarray) -> anyarray
array_shuffle(array anyarray, limit int) -> anyarray
What do you think?
I think that shuffle means shuffle, not partial shuffle, so a different
name seems better to me.
--
Fabien.