Hello,

Thank you for your feedback. I attached a patch, that addresses most of your points.

I'll look into it. It would help if the patch could include a version number at the end.

Should the exchange be skipped when i == k?

The additional branch is actually slower (on my machine, tested with an 10M element int array) for 1D-arrays, which i assume is the most common case. Even with a 2D-array with a subarray size of 1M there is no difference in execution time. i == k should be relatively rare for large arrays, given a good prng.

Ok, slower is bad.

The random and array shuffling functions share a common state. I'm wondering whether it should ot should not be so. It seems ok, but then ISTM that the documentation suggests implicitely that setseed applies to random() only, which is not the case anymore after the patch.

I really like the idea of a prng state owned by the user, that is used by all user facing random functions, but see that their might be concerns about this change. I would update the setseed() documentation, if this proposal is accepted. Do you think my patch should already include the documentation update?

Duno. I'm still wondering what it should do. I'm pretty sure that the documentation should be clear about a shared seed, if any. I do not think that departing from the standard is a good thing, either.

If more samples are required than the number of elements, it does not error out. I'm wondering whether it should.

The second argument to array_sample() is treated like a LIMIT, rather than the actual number of elements. I updated the documentation. My use-case is: take max random items from an array of unknown size.

Hmmm. ISTM that the use case of wanting "this many" stuff would make more sense because it is strictier so less likely to result in unforseen consequences. On principle I do not like not knowing the output size.

If someone wants a limit, they can easily "LEAST(#1 dim size, other limit)" to get it, it is easy enough with a strict function.

Also, the sampling should not return its result in order when the number of elements required is the full array, ISTM that it should be shuffled there as well.

You are the second person asking for this. It's done. I am thinking about ditching array_sample() and replace it with a second signature for array_shuffle():

 array_shuffle(array anyarray) -> anyarray
 array_shuffle(array anyarray, limit int) -> anyarray

What do you think?

I think that shuffle means shuffle, not partial shuffle, so a different name seems better to me.

--
Fabien.


Reply via email to