Re: Association coefficient

Donald Burrill Mon, 16 Feb 2004 14:29:12 -0800

Erica, you really need to provide more detail, possibly up to and
including an example of a table for which you're trying to generate an
index (coefficient) of association.
  From your rather telegraphic communications so far, I gather that you
have a number of types of animal (categorical variable Y), which may be
found in any of a number of types of algae (categorical variable Z), and
that you seek a measure of association between Y and Z.
  The data you wish to analyze are not frequencies (for which a standard
contingency-table analysis with a chi-square test statistic would be
appropriate), but rather densities.  In your one example, the units of
density are <number of animals observed> per <square centimeter>, which
suggests that your universe of discourse lies on a surface, not within a
three-dimensional space.

  For openers, why do you want a cofficient of association?  What do you
plan to do with it when (if!) you get one?  What do you expect it to
tell you, that you can usefully use in reporting results?  Supposing
that seeking such a coefficient, or at least some statistical analysis
of data in a two-way table, is a reasonable enterprise:
 1.  How many categories of Y, and of Z, are you working with?
 2.  Are there any category-combinations (Y_j with Z_k) that are for
some reason impossible, so that one or more cells of the Y-by-Z table
are empty?  If so, this will induce complications (but possibly not
insurmountable ones) in any analysis.
 3.  In the datum you reported, the area observed was 100 square cm, and
you state that the area differs between different algae species (that
is, between columns of your table).  What is the range of areas?

Possible approaches:
 (a)  If the areas are not VERY different from cell to cell, you could
standardize all densities to a common area like 100 sq cm.  Whatever
value you pick should be something like a median area, possibly biased
downward (for a conservative test).  Then you would have frequencies in
each cell (6, in the cell you described) and could apply the standard
contingency-table analysis.
 (b)  Using densities, you could carry out a two-way ANOVA.
Within-cell variances are not easy to conceptualize in your context
(which is to say, I haven't figured out how to calculate them), so that
the obvious ANOVA would yield only main effects.  You could then perform
a median polish (Tukey, 1977) and display it, to see if there is any
evidence of interaction between Y and Z (I should think you'd want to
pursue that question as inherent in your problem, but I may be wrong).
 (c)  Perhaps it is possible to convert your numbers of animals to
proportions:  6 out of 40 animals were type Y, p (proportion) = 0.15.
For proportions, within-cell variances are easily calculated (via
p(1-p)/n ), so you can get a proper F test for interaction in the ANOVA.
This does bring up the question of independence among the cells, and of
what your experimental units really are.  Maybe you look at a region
(100 sq cm, e.g.) and count only animals of one type;  but maybe you
look at a region and classify all the animals you observe, so that you
get (say) 6 of type Y1, 3 of type Y2, none of type Y3, and so on, all in
the same region under observation.  This would make a randomized-blocks
approach to an ANOVA appropriate, as may well be the case anyway.

 Haven't thought of any others.  You asked about "Jaccard's
coefficient", but I am unfamiliar both with it and with your data, so
have no useful comment to make.

On Sat, 14 Feb 2004, Erica So wrote:

> One of the requirement on conducting Chi-Square test of association is
> 'no more than 20% of the expected values should be less than five',
> however, all of my data are less than 1. Is there any other
> association coefficient allow expected values less than 1 in SPSS?

On Tue, 17 Feb 2004, Erica So elaborated:

> To explain why my data are less than one, let me tell u what I do in my
> experiment. My experiment is a kind a ecological survey, which involves
> counting the number of different animal species in different algae species.
> Since the area of algae species are varied, so I will calculate the density
> instead of number of individuals.
>
> eg 6 animal Y was found in 100cm2 algae Z
>      So, the data (density) that involves in the Chisq = 6/100 = 0.06,
> but not 6 individuals. That's why my data are usually less than 1 due
> to dividing the area.
>
> Someone suggest me to use the Jaccard's coefficient, however, it seems
> that this coefficient is not that common. Can I use the Jaccard's
> coefficient for my case?

 ------------------------------------------------------------
 Donald F. Burrill                              [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110      (603) 626-0816
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Association coefficient

Reply via email to