First I would like to know if my orginal post The Mob Howls Genius is missing from the list. It is no longer listed on my computer. Censorship? Did I say something that offensive?
Radford, you ask some good questions. I respond below in context: > > Presumably, you are intending it to be used to analyse observational > data - for a well-designed experiment, what causes what isn't an > issue. So how do you arrange for the distribution (which you don't > control), to be uniform? The uniformity of the data is only to promote the development of causal manifolds. A causal manifold is simply all the combinations of the causes, as illustrated by the means of the effects in the cells of a factorial design. CR works by contrasting correlations between causes in the extremes versus midranges of y. When causes are adequately sampled, all the relevant levels of the causes will be combined. If they are not, the corners of the cross tabulation will be empty, since there will be so few cases in which extremes of the causes are combined. Generate two normally distributed random variables, create their sum and crosstabulate the sum by the two normal variables. You will find empty corners in the cross tabulation. If thepairing of extreme levels of the causes is excluded by the sampling strategy, then comparisons can be made across the extremes and midranges of y will be invalid. So keep in mind, the point is to get those representative cases where extremes of causes are combined. IF such combinations occur, then CR can be done on the means of the ys in the cross tabulation, whether or not the values are strictly uniform. Means based on unequal cell sizes are riskier but as in ANOVA designs, they are used, so long as at least some combinations in the corner cells exist and estimates of the y means per cell are possible. With large surveys it is often possible to develope subsamples that fill out crosstabulations. If not, then the extremes of the causes need to be trimmed until we get a cross tabulation with the corners containing representative combinations of the causes. To trim just remove the data above and below some designated level of cause z-score, sayz=+/- 1.7, for example. Surveys in general could be designed to collect such data, though the scientist might need to seek out those individuals who can fill out the extremes. We did this in our laboratory personality research on a regular basis when I was in graduate school at the University of Florida. Such sampling was extremely common and the journals did not seem to mind. One simply has to be more explicit and purposeful in collecting samples. In secondary analyses we also used large samples, as with the High School and Beyond data, and had little difficulty filling out the corners of factorial samples. >Similarly, how do you arrange for the causes > to be uncorrelated in this distribution? This point really strikes at a deeper issue. If the measures are so designed that they are necessarily correlated, then they are bad measures. A repeat an example I described earlier of a survey we designed to measure the abuse that occurs in cults. We factor analyzed a large survey of former cult members descriptions of their groups. Four meaningful and reliable factors showed up. A number of items loaded on more than one factor. I removed all the items that loaded on more than one factor, in order to unconfound the items. We refactored and deleted items until we had 28 items that loaded on only one factor, seven per factor. These items had "simple structure." and they combined to create uncorrelated measures. It is not that these subtypes of abuse do not tend correlate across large samples containning cults and noncults. But within the cults, the subtypes can occur in any combination and on their own. These subscales were thus designed to be orthogonal yet combinable. The overall scale worked very well in differentiating cults from noncultic groups, and we could define profiles withing the cultic ranges. Similarly, if the causes are correlated in a sample, then find those cases that fill out a factorial if you can find them. If not, then factor analyse, and cross the factor scores, trimming if necessary to obtain pure measures that are conjugated/crossed. The factor analysis will not necessarily create causal variables but it will tend to collapse variables into common variances that can be identified in the usual manner. It is a good policy for the scientist who uses such factors to always ask if orthogonal factorial designs could conceivably exist between variables loading on a factor analysis factor, even if such combinations are unlikely. The logical possibility would suggest the variables are different from one another and could be measured without being confounded, perhaps with considerable effort. But then, the rewards would be significant. >And how do you know that > there is more than one cause? If the correlation is less that .95 then there is at least error variance included with the measure. Perfect correlation is beyond the scope of CR because it works by looking at the changes in the relationships of the causes across the levels of their mutual effect. If two variables are perfectly correlated, then numerically they are confounded. No statistic will unconfound them. Most of the phenomena of interest to correlational researchers, however, are not perfectly correlated with anything. So the dialectical (two causes or more approach is promising). We are looking at the causes of measured effects. Some of the variance in the measures will likely be measurement error. This measurement error counts as a causal variable! Thus, we can use regression to generate residuals and by trimming produce causal cross-tabulations that are factorial. By trimming I mean cutting off the tails of normal distributions in order to beef up the number of observations that will be crossed in the new extremes. Finally, if you did somehow manage to > arrange all that, why would you need CR to determine which are the > causes and which is the effect? Because many things can not be experimentally manipulated but can be systematically sampled. Including the astronomical, geological events Poincare mentions, as well as psychological, sociological and economic variables beyond our control. It would seem that you just need to > compute correlations. No. Traditional correlations do not reveal causation. Nor do SEM methods. The uncorrelated variables would be causes, and > the one that correlates with other variables would be the effect - if > the assumptions are true. No. You could generate residuals that are uncorrelated but not causal. Remember, to generate the residuals by predicting the cause from the effect. The sum of the putative cause and the residual should add up to a variable that correlates perfectly with the putative effect. So be sure you calculate the residuals properly. In the applied context you may have measured only two variables. There would not be a second cause to compare with the one you measure. That is why you must construct the latent estimate using residuals. Bill > > Radford Neal > > -------------------------------------------------------------------------- -- > Radford M. Neal [EMAIL PROTECTED] > Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED] > University of Toronto http://www.cs.utoronto.ca/~radford > -------------------------------------------------------------------------- -- . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
