First I would like to know if my orginal post The Mob Howls Genius is
missing from the list. It is no longer listed on my computer. Censorship?
Did I say something that offensive?


Radford, you ask some good questions. I respond below in context:

>
> Presumably, you are intending it to be used to analyse observational
> data - for a well-designed experiment, what causes what isn't an
> issue.  So how do you arrange for the distribution (which you don't
> control), to be uniform?

The uniformity of the data is only to promote the development of causal
manifolds. A causal manifold is simply all the combinations of the causes,
as illustrated by the means of the effects in the cells of a factorial
design. CR works by contrasting correlations between causes in the extremes
versus midranges of y.  When causes are adequately sampled, all the relevant
levels of the causes will be combined. If they are not, the corners of the
cross tabulation will be empty, since there will be so few cases in which
extremes of the causes are combined. Generate two normally distributed
random variables, create their sum and crosstabulate the sum by the two
normal variables. You will find empty corners in the cross tabulation.

If thepairing of extreme levels of the causes is excluded by the sampling
strategy, then comparisons can be made across the extremes and midranges of
y will be invalid. So keep in mind, the point is to get those representative
cases where extremes of causes are combined.  IF such combinations occur,
then CR can be done on the means of the ys in the cross tabulation, whether
or not the values are strictly uniform. Means based on unequal cell sizes
are riskier but as in ANOVA designs, they are used, so long as at least some
combinations in the corner cells exist and estimates of the y means per cell
are possible.

With large surveys it is often possible to develope subsamples that fill out
crosstabulations. If not, then the extremes of the causes need to be trimmed
until we get a cross tabulation with the corners containing representative
combinations of the causes. To trim just remove the data above and below
some designated level of cause z-score, sayz=+/- 1.7, for example. Surveys
in general could be designed to collect such data, though the scientist
might need to seek out those individuals who can fill out the extremes. We
did this in our laboratory personality research on a regular basis when I
was in graduate school at the University of Florida. Such sampling was
extremely common and the journals did not seem to mind. One simply has to be
more explicit and purposeful in collecting samples. In secondary analyses we
also used large samples, as with the High School and Beyond data, and had
little difficulty filling out the corners of factorial samples.




>Similarly, how do you arrange for the causes
> to be uncorrelated in this distribution?

This point really strikes at a deeper issue. If the measures are so designed
that they are necessarily correlated, then they are bad measures. A repeat
an example I described earlier of a survey we designed to measure the abuse
that occurs in cults. We factor analyzed a large survey of former cult
members descriptions of their groups. Four meaningful and reliable factors
showed up. A number of items loaded on more than one factor. I removed all
the items that loaded on more than one factor, in order to unconfound the
items. We refactored and deleted items until we had 28 items that loaded on
only one factor, seven per factor. These items had "simple structure." and
they combined to create uncorrelated measures.  It is not that these
subtypes of abuse do not tend correlate across large samples containning
cults and noncults. But within the cults, the subtypes can occur in any
combination and on their own. These subscales were thus designed to be
orthogonal yet combinable. The overall scale worked very well in
differentiating cults from noncultic groups, and we could define profiles
withing the cultic ranges.

Similarly, if the causes are correlated in a sample, then find those cases
that fill out a factorial if you can find them. If not, then factor analyse,
and cross the factor scores, trimming if necessary to obtain pure measures
that are conjugated/crossed. The factor analysis will not necessarily create
causal variables but it will tend to collapse variables into common
variances that can be identified in the usual manner. It is a good policy
for the scientist who uses such factors to always ask if orthogonal
factorial designs could conceivably exist between variables loading on a
factor analysis factor, even if such combinations are unlikely. The logical
possibility would suggest the variables are different from one another and
could be measured without being confounded, perhaps with considerable
effort. But then, the rewards would be significant.




>And how do you know that
> there is more than one cause?

If the correlation is less that .95 then there is at least error variance
included with the measure.  Perfect correlation is beyond the scope of CR
because it works by looking at the changes in the relationships of the
causes across the levels of their mutual effect. If two variables are
perfectly correlated, then numerically they are confounded. No statistic
will unconfound them. Most of the phenomena of interest to correlational
researchers, however, are not perfectly correlated with anything. So the
dialectical (two causes or more approach is promising).  We are looking at
the causes of measured effects. Some of the variance in the measures will
likely be measurement error. This measurement error counts as a causal
variable! Thus, we can use regression to generate residuals and by trimming
produce causal cross-tabulations that are factorial. By trimming I mean
cutting off the tails of normal distributions in order to beef up the number
of observations that will be crossed in the new extremes.




Finally, if you did somehow manage to
> arrange all that, why would you need CR to determine which are the
> causes and which is the effect?

Because many things can not be experimentally manipulated but can be
systematically sampled. Including the astronomical, geological events
Poincare mentions, as well as psychological, sociological and economic
variables beyond our control.



It would seem that you just need to
> compute correlations.

No. Traditional correlations do not reveal causation. Nor do SEM methods.


The uncorrelated variables would be causes, and
> the one that correlates with other variables would be the effect - if
> the assumptions are true.

No. You could generate residuals that are uncorrelated but not causal.
Remember, to generate the residuals by predicting the cause from the effect.
The sum of the putative cause and the residual should add up to a variable
that correlates perfectly with the putative effect.  So be sure you
calculate the residuals properly.

In the applied context you may have measured only two variables. There would
not be a second cause to compare with the one you measure. That is why you
must construct the latent estimate using residuals.

Bill


>
>    Radford Neal
>
> --------------------------------------------------------------------------
--
> Radford M. Neal
[EMAIL PROTECTED]
> Dept. of Statistics and Dept. of Computer Science
[EMAIL PROTECTED]
> University of Toronto
http://www.cs.utoronto.ca/~radford
> --------------------------------------------------------------------------
--



.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to