On Wed, 05 Jul 2000 16:30:44 +0200, Martin Brunner
<[EMAIL PROTECTED]> wrote:

> I reviewed articles using structural equation modeling. To estimate the
> rater bias each article was read by another rater. The items were
> closely linked to questions used in the relevant literature.
> 
> My original plan was to conduct a g-study to estimate the
> inter-rater-reliability.
> 
> My facets are:
> 
> - Rater: R with 2 levels
> - Items: I with 21 levels; the items are dichotomous.
> - Articles: A with 43 levels.
> 
> The design was fully crossed without missing data.
> I used the program GT by Pierre Ysewijn.
> 
> My problem is that the items cover very different topics and not a
> homogenous dimension. Thus I cannot form scales as my items are somewhat
> independent of each other.

I searched for g-study on Google and came up with just a couple of
relevant hits.  That category, g-study,  is not as widely known as you
expect.  "Generalizability"?  

I think you have been overly impressed by some rhetoric.
If you don't have a dimension or two, why were you collecting data?
Do you really think you have 21 interesting and useful hypotheses? -
the alternative is that, indeed, you do have "dimensions."   

"Somewhat independent" is not bad -- what size are the correlations?
I don't trust correlations of .20 because I expect Response Bias to be
almost that big.  But with dichotomies, intercorrelations of .45 may
be pretty big ones, for subjective judgements.  (Especially, if you
want a short scale, perhaps you should re-design in order to use items
with 4 or 5 scale points.  Was there a reason to accept the
disadvantage of lower inherent item reliability by using dichotomies?)

Item by item, there are not a lot of comparisons available:  You can
look at McNemar's test for differences, and some measure of
correlation (phi is a Pearson's r; kappa is popular).  With only 43
ratings, there is not a lot of power, so you should report and be wary
of tendencies that are less "significant."  On the other hand, since
there are 21 comparisons, you should not be overly impressed by single
differences, either.

For the totals that you should create, you can similarly look at the
paired t-test for differences, and look at Pearson's r  for the
similarity.  For a set, you should look at the average correlation and
report that, or the (Cronbach's) alpha.

> I would like to generalize over raters and find an answer to the
> question:
> To what degree are the ratings on the items independent of the rater?
> Are the raters interchangeable?
> 
> Thus could anybody tell me please what the appropriate term for the
> "reliability" estimate is? Which terms have to be in the nominator and
> wich terms in the denominator.

Hope this helps.
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to