On 21 Apr 2000, Kermit Rose wrote:
> The following is my proposed methodology for analyzing survey data for a
> professor in criminology at Florida State University.
>
> I'd like someone experienced in categorical data analysis to review it and
> email me comments, or criticisms, or suggestions.
[ KR: Please post any response to the edstat list as well as to me.
I may not have the leisure to continue this conversation, and others
on the list may have better advice for you in any case. -- DFB. ]
To begin with, you have not told us what the professor in criminology is
trying to find out, and why; without that information, no-one can offer
you (or the professor) useful advice on data analysis.
Your proposed procedure seems unnecessarily cumbersome. So far as I can
tell from all that algebra, you're effecitively substituting a whole
bunch of 2x2 tables for a single RxC table (R = number of rows, C =
number of columns) with R>2 and(/or?) C>2. Or, for each of several RxC
tables.
Why do you not first do the obvious contingency table chi-square
to see if there's anything worth following up? (And if I were doing it,
the follow-up(s) would be in the RxC format as well.)
Your #1 null hypothesis is that two dichotomized variables are
independent. I don't believe the test statistic you propose -- more
about that later -- but this is the formal null hyp. for the usual
contingency table chi square.
> The dependent variable is a multivalued categorical variable.
So far, so good.
> The model dependent variable is an interaction variable. It is the
> interaction of some subset of the 10 two-valued dummy variables
> representing the dependent variable. The 10 values of the dependent
> variable are choices of preferential treatment for affirmative action.
Here it begins to get sticky. I cannot tell whether you mean the
same thing by "interaction" that I would mean. In particular,
there seems to be no difference between "interaction variable",
in your terms, and "indicator variable", in my terms.
> The model independent variable is also an interaction variable. It is
> the interaction of two-valued dummy variables representing some subset
> of the predicting variables.
< snip, details of standard construction of indicator variables >
> Suppose R is an ordinal level variable with values r1 < r2 < r3 < r4.
> Then R is converted to three variables S1,S2,S3 with
>
> S1 = 1 if R = r1 and S1 = 0 otherwise.
> S2 = 1 if R = r1 or r2 and S2 = 0 otherwise.
> S3 = 1 if R = r1 or r2 or r3 and S3 = 0 otherwise.
Possible, but doesn't seem necessary. Why not leave R as it is instead
of constructing dichotomies?
< snip, details of argumentation >
> We define parameters t,I,D and N as follows.
>
> t is the number of cases where both the model independent and model dependent
> variable are true.
>
> I is the number of cases where the model independent variable is true.
>
> D is the number of cases where the model Dependent variable is true.
>
> N is the total number of cases.
Somewhat more briefly, and a good deal more clearly for me, you are
defining a 2x2 table thus:
Independent Dependent variable
variable 1 0 | TOTAL
----------------+-----------------------+--------
1 | t . | I
0 | . . | .
----------------+-----------------------+--------
TOTAL | D . | N
with the values marked "." determined by subtraction.
(You defined a "Col pct" as t/D, which is not a % but a proportion.)
< snip, tedious definition of above table >
< snip, definitions of covariance, variances, r^2;
which I did not really follow >
> Chisquare of crosstab of model independent variable with model
> dependent variable is
>
> (t - D*I/N)*( N/[D*I] + N/[I*(N-D)] + N/[D*(N-I)] + N/[(N-I)*(N-D)] )
No, I don't think so. Your formula is of the form A*B. B is
nonnegative. A may be positive or negative, so the product is positive
or negative depending on A. Chisquare cannot be negative.
> The significance number that is calculated for a statistic is the
> predicted probability that the null hypothesis is true.
I very much doubt it. This certainly does not conform to the standard
statistical definition of "level of significance", which I assume to be
what you want us to understand by "significance number".
> There are two different null hypotheses of interest.
>
> null_1:
> The dependent variable does not depend on independent variable.
Discussed above.
> The significance of null_1 is (D - t)/N
Can't imagine why it would be. In any case, one doesn't speak of
"the significance of an hypothesis" (null or otherwise), one speaks of
the significance of a test, or of a test statistic; and one is thereby
referring to a sampling distribution of the test statistic in question.
You seem to have no sampling distribution in mind.
> null_2:
>
> There is not a bidirectional relationship between the independent
> variable and the dependent variable.
What do you mean by a "bidirectional relationship"? Whatever you mean,
it cannot be different, so far as I can see, from null_1, for a 2x2
table. You've only got one degree of freedom for detecting whether there
is ANY relationship of any kind; you have zero d.f. for detecting
whether a relationship, once found, is of one kind or another.
> The significance of null_2 is (I+D-2*t)/N
See comments above on "significance of null_1".
-- DFB.
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
MSC #29, Plymouth, NH 03264 603-535-2597
184 Nashua Road, Bedford, NH 03110 603-471-7128
===========================================================================
This list is open to everyone. Occasionally, less thoughtful
people send inappropriate messages. Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.
For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================