Re: categorical data analysis

Donald F. Burrill Sat, 22 Apr 2000 10:36:24 -0700
On 21 Apr 2000, Kermit Rose wrote:

> The following is my proposed methodology for analyzing survey data for a
> professor in criminology at Florida State University.
> 
> I'd like someone experienced in categorical data analysis to review it and
> email me comments, or criticisms, or suggestions.

[ KR:  Please post any response to the edstat list as well as to me.
  I may not have the leisure to continue this conversation, and others 
  on the list may have better advice for you in any case. -- DFB. ]

To begin with, you have not told us what the professor in criminology is 
trying to find out, and why;  without that information, no-one can offer 
you (or the professor) useful advice on data analysis.

Your proposed procedure seems unnecessarily cumbersome.  So far as I can 
tell from all that algebra, you're effecitively substituting a whole 
bunch of 2x2 tables for a single RxC table (R = number of rows, C = 
number of columns) with R>2 and(/or?) C>2.  Or, for each of several RxC 
tables.
        Why do you not first do the obvious contingency table chi-square 
to see if there's anything worth following up?  (And if I were doing it, 
the follow-up(s) would be in the RxC format as well.)
        Your #1 null hypothesis is that two dichotomized variables are 
independent.  I don't believe the test statistic you propose -- more 
about that later -- but this is the formal null hyp. for the usual 
contingency table chi square.

> The dependent variable is a multivalued categorical variable.
        So far, so good.

> The model dependent variable is an interaction variable.  It is the 
> interaction of some subset of the 10 two-valued dummy variables 
> representing the dependent variable.  The 10 values of the dependent 
> variable are choices of preferential treatment for affirmative action. 
        Here it begins to get sticky.  I cannot tell whether you mean the 
        same thing by "interaction" that I would mean.  In particular, 
        there seems to be no difference between "interaction variable", 
        in your terms, and "indicator variable", in my terms.

> The model independent variable is also an interaction variable.  It is
> the interaction of two-valued dummy variables representing some subset
> of the predicting variables.

  <  snip, details of standard construction of indicator variables > 

> Suppose R is an ordinal level variable with values r1 < r2 < r3 < r4.
> Then R is converted to three variables S1,S2,S3 with
> 
> S1 = 1 if R = r1 and S1 = 0 otherwise.
> S2 = 1 if R = r1 or r2 and S2 = 0 otherwise.
> S3 = 1 if R = r1 or r2 or r3 and S3 = 0 otherwise.

Possible, but doesn't seem necessary.  Why not leave R as it is instead 
of constructing dichotomies?  

        <  snip, details of argumentation  >

> We define parameters t,I,D and N as follows.
> 
> t is the number of cases where both the model independent and model dependent
> variable are true.
> 
> I is the number of cases where the model independent variable is true.
> 
> D is the number of cases where the model Dependent variable is true.
> 
> N is the total number of cases.

Somewhat more briefly, and a good deal more clearly for me, you are 
defining a 2x2 table thus:

        Independent        Dependent variable
        variable                1       0       |  TOTAL
        ----------------+-----------------------+--------
                1       |       t       .       |   I
                0       |       .       .       |   .
        ----------------+-----------------------+--------
             TOTAL      |       D       .       |   N
 
with the values marked "." determined by subtraction.
 (You defined a "Col pct" as t/D, which is not a % but a proportion.)

        <  snip, tedious definition of above table  >

        <  snip, definitions of covariance, variances, r^2;
                which I did not really follow   >

>  Chisquare of crosstab of model independent variable with model 
> dependent variable is
> 
>  (t - D*I/N)*( N/[D*I] + N/[I*(N-D)] + N/[D*(N-I)] + N/[(N-I)*(N-D)] )

No, I don't think so.  Your formula is of the form A*B.  B is 
nonnegative.  A may be positive or negative, so the product is positive 
or negative depending on A.  Chisquare cannot be negative.

> The significance number that is calculated for a statistic is the
> predicted probability that the null hypothesis is true.

I very much doubt it.  This certainly does not conform to the standard 
statistical definition of "level of significance", which I assume to be 
what you want us to understand by "significance number".

> There are two different null hypotheses of interest.
> 
> null_1:
> The dependent variable does not depend on independent variable.  

        Discussed above.

> The significance of null_1 is   (D - t)/N

        Can't imagine why it would be.  In any case, one doesn't speak of 
"the significance of an hypothesis" (null or otherwise), one speaks of 
the significance of a test, or of a test statistic;  and one is thereby 
referring to a sampling distribution of the test statistic in question. 
You seem to have no sampling distribution in mind.

> null_2:
> 
> There is not a bidirectional relationship between the independent 
> variable and the dependent variable.

What do you mean by a "bidirectional relationship"?  Whatever you mean, 
it cannot be different, so far as I can see, from null_1, for a 2x2 
table.  You've only got one degree of freedom for detecting whether there 
is ANY relationship of any kind;  you have zero d.f. for detecting 
whether a relationship, once found, is of one kind or another.

> The significance of null_2 is (I+D-2*t)/N

        See comments above on "significance of null_1".
                                                        -- DFB.
 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                                 603-535-2597
 184 Nashua Road, Bedford, NH 03110                          603-471-7128  



===========================================================================
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================
Re: categorical data analysis

Reply via email to