Joseph Rushton Wakeling , dans le message (digitalmars.D:172997), a écrit : > In other words, either your input range needs hasLength!R == true or > you need to manually specify the total number of items when calling > randomSample: > > But what if the total number of items is not known in advance? E.g. if you > are > reading a file, line by line, or reading records from a tape; you may know > the > total is finite, but you don't know what it actually _is_. [snip] > ... but doing something similar within RandomSample doesn't seem so easy. > Why? > Because the static if()s that you'd require within the struct would not > depend > just on whether hasLength!R == true, but also on whether you'd passed a > size_t > total to the constructor.
Why not using takeExactly ? this is the standard way select from a subset of the original range. I wouldn't even have provided the overload with 3 arguments, the user being able to use takeExactly when necessary (which could be advised in the doc in case the user doesn't know). struct RandomSample(R) if (isInputRange!R && hasLength!R) { ...// always use r.length, never total/available } auto randomSample(R)(R r, size_t n, size_t total) if(isInputRange!R) { return randomSample!(R, void)(takeExactly(r, total), n); } struct RandomSample(R) if(isInputRange!R && !hasLength!R) { ...// always reservoir random sample } There is no more issue here. > I also think it would be a good idea for the reservoir sampling technique to > emit a warning when in debug mode, to prompt the user to be _sure_ that they > can't specify the total number of points to sample from. Is there a > recommended > method of doing something like this? I don't think library polluting compiler warnings is recommended. > Alternatively, would people prefer to entirely separate the known-total and > unknown-total sampling methods entirely, so the choice is always manual? RandomSample is a lazy range. RandomReservoirSample is not, and has a completely different implementation. IMHO, there is a fundamental difference that justifies to have a separate function with a different name. > Finally, if hasLength!R == false, is there any way of guaranteeing that the > input range is still going to be ultimately finite? There could be some very > nasty worst-case behaviour in the case of infinite ranges. IsInfinite!Range. However, a finite range could return false on empty indefinitely, would the implementer of the range just forget to make empty an enum, or the user meet a corner case (e.g. repeat(1).until(2)). But that's a general problem, that would make most eager algorithm result in an infinite loop, starting with array and copy... -- Christophe