In article <8am7d1$hqj$[EMAIL PROTECTED]>,
[EMAIL PROTECTED] says...
>
> I think I made the formulation too wordy in previous
> post.
>
> Let me try this simple question:
>
> When one wishes to do a (multi)linear regression on a set of
> observed data, and one is in the (unusual) position of possessing
> a set of sample standard deviations (of varying degrees of f.)
> at each value of the "explanatory" variable, how does one
> determine whether one ought or ought not to solve the weighted
> least squares problem using those sample standard deviations?
>
> What is the usual decision test for "heterscedasticity" *before* one
> solves the regression system? What do people do in practise?
>
Most social scientists don't worry very much about the assumptions of OLS
regression, noting that OLS estimates are fairly robust and can give
unbiased estimates even if those assumptions aren't fulfilled. Exceptions
are multilevel models and time series data, data for which the assumption
of uncorrelated error terms is violated. But these require special
programs, not weighted least squares.
There is also some debate on using weights for stratified sampling and/or
to correct for sampling bias. Weighting leads to correct estimates but
incorrect standard errors. One solution is to include the design
variables in the model instead of weighting. Stata and Wesvar are two
programs that can take weighting into account when calculating standard
errors of estimates. But a quite common approach is to use weights for
descriptive statistics, but not in multivariate models.
Weights can also be used for certain dependent variables that will
violate the assumption of heteroscedasticity, e.g. a dichotomous
dependent. I recently did a weighted least squares analysis for a co-
worker to replicate an analysis in another paper. The weight was
groupn*pct*(1-pct), where groupn was the number of cases per group and
pct was the proportion with a positive response within each group. But
this basically amounts to a poor approximation of a logit model. Programs
like GLIM that use iteratively reweighted least squares use pct*(1-pct)
as the weight when estimating the model, but now pct is the predicted
probability from the previous iteration.
As for a test for heteroscedasticity, Stata has a "hettest", which
performs a Cook-Weisberg test and produces a chi-square statistic. They
wrote a book in 1982, "Residuals and influence in regression". I've never
used it though.
Hope this helps,
John Hendrickx
===========================================================================
This list is open to everyone. Occasionally, less thoughtful
people send inappropriate messages. Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.
For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================