DeLa wrote:

> Well, I suppose that when the sample is too big almost every
> relation will prove to be "significant". A lot of
> pseudo-relations will occur. It will become difficult to detect
> intermediate(1)  variables or neutralise them because there will
> be many candidates - if not all the variables will be
> intermediating in some way or another. You will need to have a
> lot of insight in the matter to distinguish between absurdities
> and relevant relations, which makes it difficult for 'the
> reader' to follow your analytical thinking and make the
> separation between logical statistical steps and interpretation.

IMHO significance tests against a point null hypothesis are usually
pretty questionable; unless the aim is the show that -- for the sample
sizes available -- the parameter is indistinguishable from the point
value.

As the sample size goes to infinity, almost any point null hypothesis
becomes significant. Perhaps this forces you to think what you are
really trying to do. Compare against a range where the problem-
domain significance is negligible? Perform model selection (AIC,
BIC, etc.)?

> Also, at a certain point, if the sample contains a majority of
> the population, chances that the non-responding group is
> "significantly" different from the other increase.

Good point if you are sampling a finite population and non-
response is an option. OTOH isn't it common to try to select
a representative sample and follow-up non respondents?

> Therefore, apart from economical arguments (increase in size
> costs a lot but does not add value anymore) there are also
> scientific arguments to put a limit to sample sizes.

Pragmatic rather than scientific? If there is a real difference between
respondents and non respondents maybe it needs investigation.

> In general, too large samples might distract you from all the
> other elements you need to keep an eye on in judging if a survey
> is conducted in a serious way.
> E.g. I prefer a nicely stratified sample of 800 persons which
> represents a response of say 80% in a smart design than a poorly
> designed sample of 2500 persons representing a response of 37%.

I agree completely. I took your original query to suggest that a small
sample could be in some way preferable to a huge sample; given
equal quality and negligible cost difference.

If you are conducting a survey of some kind the quality quickly
becomes the dominant factor. High quality data gives a good
estimate with believable error bounds. Poor data can easily give
you the wrong answer with tight (but untrue) error bounds.

Peter

Off topic but many years ago a UK survey of the use of the rail
network resulted in many stations being closed. It was subsequently
revealed that some of the estimates of commuter traffic were based
on samples of weekend rail usage.



===========================================================================
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================

Reply via email to