On Tue, 19 Jul 2022 at 21:21, Martin Kalcher <martin.kalc...@aboutsource.net> wrote: > > Here is a patch with dimension aware array_shuffle() and array_sample(). >
+1 for this feature, and this way of handling multi-dimensional arrays. > If you think array_flatten() is desirable, i can take a look at it. That's not something I've ever wanted -- personally, I rarely use multi-dimensional arrays in Postgres. A couple of quick comments on the current patch: It's important to mark these new functions as VOLATILE, not IMMUTABLE, otherwise they won't work as expected in queries. See https://www.postgresql.org/docs/current/xfunc-volatility.html It would be better to use pg_prng_uint64_range() rather than rand() to pick elements. Partly, that's because it uses a higher quality PRNG, with a larger internal state, and it ensures that the results are unbiased across the range. But more importantly, it interoperates with setseed(), allowing predictable sequences of "random" numbers to be generated -- something that's useful in writing repeatable regression tests. Assuming these new functions are made to interoperate with setseed(), which I think they should be, then they also need to be marked as PARALLEL RESTRICTED, rather than PARALLEL SAFE. See https://www.postgresql.org/docs/current/parallel-safety.html, which explains why setseed() and random() are parallel restricted. In my experience, the requirement for sampling with replacement is about as common as the requirement for sampling without replacement, so it seems odd to provide one but not the other. Of course, we can always add a with-replacement function later, and give it a different name. But maybe array_sample() could be used for both, with a "with_replacement" boolean parameter? Regards, Dean