Hi Tibor,

absolutely. The whole idea that acceptability is a special concept is not
based on any evidence. We should simply treat it like we treat all other
psychological measures. It's a rating, no more, no less. We can ask whether
and how much two types of structures seem to differ in acceptability, and
if we do so while randomly sampling the relevant parts of underlying
population of linguistic outputs, and if we use statistical procedures that
adequately capture (and correct for) violations of independence in our
data, we can conclude that one structure / constraint / whatever is rated
to be "significantly" higher/lower in acceptability than another. We can
also quantify *relative *difference (e.g., we might compare how much
superiority violations compare to other types of extraction constraints, or
how different word orders compare in terms of acceptability.

Odds always confuse people. But the issue arises for any type of measure.
Reaction time differences might *seem* intuitively interpretable, but they
are arguably just as arbitrary a scale. Same for accuracy. etc.

Florian

On Tue, Mar 24, 2020 at 11:27 AM Tibor Kiss <ti...@linguistics.rub.de>
wrote:

> Dear Florian,
>
> thanks for your comment, if course 1.800 = 1800 and the real lower bound
> of the odds ratio is 0, but since any x < 1 can be transformed into some
> effect 1/x, I used 1.
>
> I already suspected separation and following your advice, I have used the
> full control data, which yields an OR of 602, which of course is also very
> different from 2 or 4.
>
> The more general issue is the actual magnitude of an effect measured in
> terms of an odds ratio. An OR of 2 and 4, respectively, sounds great for
> two levels of adverbial interpretation if the third level produced one of
> 1, and all levels are significant. If we further assume that author A makes
> the claim that all three levels are just members of the same class, and
> hence should behave alike, then the model seems to refute that claim, but
> this of course depends on how we measure the effects. A possible answer
> might be that concepts like grammatical/ungrammatical are idealizations,
> and that we actually see effects, and that the theoretical linguist A did
> not consider enough data and variation to take into account that his class
> is actually not a class.
>
> Best
>
> Tibor
>
> Am 24.03.2020 um 15:22 schrieb Florian Jaeger <timegu...@gmail.com>:
>
> Dear Tibor,
>
> (I take that by 1.800 you mean 1800, German-style formatting?).
>
> Since you excluded everyone who did not perfectly (or near perfectly)
> seperated control items, the high odds ratio follows. Indeed of perfect
> separation was required, you should have an infinite odds ratio for the
> control items. So if you want a (at least somewhat) informative upper
> limit, I'd get the control ratio from ALL participants.
>
> More generally, this approach (exclusion based on people using the scale
> the way you think it should be used) can be problematic, as it biases the
> sample that you analyze. From my own experiments (ooof 20 years ago!) I
> also remember that even for control items there can be lots of variation
> across people (you might only see that if you have enough data, as
> variability in odds close to 0 and infinity is harder to detect), and so
> norming based on control items should probably just be seen as a heuristic,
> rather than sth absolute.
>
> Finally, 1 is not a real lower bound (ratios could be below 1), but I
> think you meant it as sth to help your readers to understand the effect?
>
> Hth,
>
> Florian
>
> On Sat, Mar 21, 2020, 12:13 Tibor Kiss <ti...@linguistics.rub.de> wrote:
>
>> Dear list members,
>>
>> odds ratios are generally accepted as effect sizes for GLMs and GLMMs
>> (including cumulative link MMs). I am contemplating the concept of an upper
>> bound of an odds ratio to get a better feeling for how big an effect size
>> actually is.
>>
>> We are currently carrying out online experiments (jsPsych => JATOS =>
>> prolific => R) on PP placement and PP interpretation in German clauses. In
>> one of the experiments, which I take as a basis here, we had three
>> different PP interpretations, and two different PP positions (in relation
>> to a fixed object NP), yielding a 3x2 design. The experiment was carried
>> out as a 5-point-Likert-scale experiment, and the analysis makes use of an
>> interaction between the two IVs, as well as the usual suspects
>> (participants, items) as random effects in ordinal::clmm.
>>
>> For the three different interpretations the model provides odds ratios of
>> 1, 2, and 4, depending on the placement. So there was no effect for the
>> first interpretation, the likelihood for the second interpretation to
>> receive a higher rating was doubled in one of the conditions, and the
>> likelihood of a higher rating was x4 for the third interpretation in the
>> same condition. (As for significance, four of the five parameters (second,
>> third interpretation, second placement, and the interaction between the
>> first and the third parameter) yielded ***, only the second interaction
>> came in with **.)
>>
>> While it is clear that an odds ratio of 1 means „no effect“, and is thus
>> a lower bound, the magnitude of the other effects is not entirely clear to
>> me. For this reason, I thought that a possible upper bound for an effect
>> size in the setting of an acceptability study could be provided by a model
>> with takes the control items as its input, where the control items come in
>> two different grammaticality states (1 = perfectly acceptable, 2:
>> unacceptable according to the gospel, and our tainted intuitions), and use
>> the states to predict the ratings. Since we have used the control items to
>> exclude spammers etc., the remaining participants should all have agreed to
>> the separation provided by the control items. We had 51 participants, each
>> of which saw 24 control items.
>>
>> I have thus defined another CLMM, where the states, and the two phenomena
>> which we used (one related, one unrelated) form the (interacting) IVs. The
>> resulting model provides an odds ratio for the grammatical vs.
>> ungrammatical state of over 1.800 (somewhat trivial: a perfectly
>> grammatical example makes a higher rating 1.800 times more likely than an
>> offending example)! This seems a high number to me, and in particular, an
>> effect size of 2 or 4 seems to be dwarfed by it. On the other hand, I used
>> the term „separation“ in the last sentence deliberately, so perhaps there
>> is a logical fault in using the control items in this way.
>>
>> If necessary, I can provide the pertinent code, but I hope that gist of
>> it is clear. I am wondering whether an upper bound (which would be around
>> 1.800 here) is a useful concept, or whether there are other general
>> guidelines (for acceptability studies in particular) which provide some
>> indication about the magnitude of an effect size.
>>
>> Thanks a lot for consideration and advice from a completely isolated
>> citizen of Germany (but not ill either).
>>
>> With kind regards
>>
>>
>> Tibor
>>
>> ———————————————————
>> Die Inhalte dieser E-Mail sind vertraulich und einzig für den in der
>> E-Mail genannten Adressatenkreis bestimmt. Die (auch partielle) Weitergabe
>> dieser Nachricht in jedweder Form bedarf der vorherigen schriftlichen
>> Genehmigung des Absenders.
>>
>> Prof. Dr. Tibor Kiss
>> Sprachwissenschaftliches Institut
>> Ruhr-Universität Bochum
>>
>>
>

Reply via email to