Josh Berkus <[email protected]> writes:
> > Only if your sample is random and independent. The existing mechanism tries
> > fairly hard to ensure that every record has an equal chance of being
> > selected.
> > If you read the entire block and not appropriate samples then you'll
> > introduce
> > systematic sampling errors. For example, if you read an entire block you'll
> > be
> > biasing towards smaller records.
>
> Did you read any of the papers on block-based sampling? These sorts of
> issues
> are specifically addressed in the algorithms.
We *currently* use a block based sampling algorithm that addresses this issue
by taking care to select rows within the selected blocks in an unbiased way.
You were proposing reading *all* the records from the selected blocks, which
throws away that feature.
--
greg
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org