Erica, you really need to provide more detail, possibly up to and including an example of a table for which you're trying to generate an index (coefficient) of association. From your rather telegraphic communications so far, I gather that you have a number of types of animal (categorical variable Y), which may be found in any of a number of types of algae (categorical variable Z), and that you seek a measure of association between Y and Z. The data you wish to analyze are not frequencies (for which a standard contingency-table analysis with a chi-square test statistic would be appropriate), but rather densities. In your one example, the units of density are <number of animals observed> per <square centimeter>, which suggests that your universe of discourse lies on a surface, not within a three-dimensional space.
For openers, why do you want a cofficient of association? What do you plan to do with it when (if!) you get one? What do you expect it to tell you, that you can usefully use in reporting results? Supposing that seeking such a coefficient, or at least some statistical analysis of data in a two-way table, is a reasonable enterprise: 1. How many categories of Y, and of Z, are you working with? 2. Are there any category-combinations (Y_j with Z_k) that are for some reason impossible, so that one or more cells of the Y-by-Z table are empty? If so, this will induce complications (but possibly not insurmountable ones) in any analysis. 3. In the datum you reported, the area observed was 100 square cm, and you state that the area differs between different algae species (that is, between columns of your table). What is the range of areas? Possible approaches: (a) If the areas are not VERY different from cell to cell, you could standardize all densities to a common area like 100 sq cm. Whatever value you pick should be something like a median area, possibly biased downward (for a conservative test). Then you would have frequencies in each cell (6, in the cell you described) and could apply the standard contingency-table analysis. (b) Using densities, you could carry out a two-way ANOVA. Within-cell variances are not easy to conceptualize in your context (which is to say, I haven't figured out how to calculate them), so that the obvious ANOVA would yield only main effects. You could then perform a median polish (Tukey, 1977) and display it, to see if there is any evidence of interaction between Y and Z (I should think you'd want to pursue that question as inherent in your problem, but I may be wrong). (c) Perhaps it is possible to convert your numbers of animals to proportions: 6 out of 40 animals were type Y, p (proportion) = 0.15. For proportions, within-cell variances are easily calculated (via p(1-p)/n ), so you can get a proper F test for interaction in the ANOVA. This does bring up the question of independence among the cells, and of what your experimental units really are. Maybe you look at a region (100 sq cm, e.g.) and count only animals of one type; but maybe you look at a region and classify all the animals you observe, so that you get (say) 6 of type Y1, 3 of type Y2, none of type Y3, and so on, all in the same region under observation. This would make a randomized-blocks approach to an ANOVA appropriate, as may well be the case anyway. Haven't thought of any others. You asked about "Jaccard's coefficient", but I am unfamiliar both with it and with your data, so have no useful comment to make. On Sat, 14 Feb 2004, Erica So wrote: > One of the requirement on conducting Chi-Square test of association is > 'no more than 20% of the expected values should be less than five', > however, all of my data are less than 1. Is there any other > association coefficient allow expected values less than 1 in SPSS? On Tue, 17 Feb 2004, Erica So elaborated: > To explain why my data are less than one, let me tell u what I do in my > experiment. My experiment is a kind a ecological survey, which involves > counting the number of different animal species in different algae species. > Since the area of algae species are varied, so I will calculate the density > instead of number of individuals. > > eg 6 animal Y was found in 100cm2 algae Z > So, the data (density) that involves in the Chisq = 6/100 = 0.06, > but not 6 individuals. That's why my data are usually less than 1 due > to dividing the area. > > Someone suggest me to use the Jaccard's coefficient, however, it seems > that this coefficient is not that common. Can I use the Jaccard's > coefficient for my case? ------------------------------------------------------------ Donald F. Burrill [EMAIL PROTECTED] 56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816 . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
