Hi everybody, here is another try on effect sizes, this time from the perspective of power calculations.
I hope that the following characterization is sufficient: In Germanic languages, part of the VP can be topicalized, leaving other parts behind, such as illustrated in (1): (1) a. Ohne einen Lappen gespült hat sie in einer Kaffeepause. without a rag rinsed has she during a coffee break ’She rinsed without a rag during a coffee break.’ b. In einer Kaffeepause gespült hat sie ohne einen Lappen. (same as (1a) only order of temporal and instrumental PP reversed) There are some claims in the literature (as e.g. Frey and Pittner 1998, but not directed towards the specific examples here) that speakers should judge (1a) significantly better than (1b). There is also, a more general, almost tacit claim that examples like (1a) are derived from verbfinal structures that are structurally identical, and moreover that a reversal of the PPs in the verbfinal clause is ok. So, both (2a) and (2b) are supposed to be fine. (2) a. … dass sie in einer Kaffeepause ohne einen Lappen gespült hat. that she during a coffee break without a rag rinsed has b. … dass sie [ohne einen Lappen]_i in einer Kaffeepause t_i gespült hat. (same as (2a) with scrambling of the PP) But it would be impossible to have a derivation where you first have a reversal of the PPs and then a partial VP topicalization, because the topicalized phrase contains a trace (or, in more contemporary parlance: a copy), which cannot be linked to the (scrambled) antecedent. So, only (2a) could be an input to yield (1a), while (2b) would be an input to yield (1b), but then the trace is lost … [It is not relevant for the present purposes that there might be alternative analyses which do not make use of scrambling/traces/copies.] This is all classical generative linguistics, but here comes my question. We can translate all this into two hypotheses, H0 stating that there is no difference between structures of type (1) and structures of type (2) in terms of acceptability judgments, and HA stating that there is. The question is: what should be the minimal effect size that we accept in this case? I dare say that a conservative guess (the least we should get) would be something like 2. In terms of a two-alternative-forced-choice study (Sprouse et al. 2013), where subjects have to pick one of two sentences presented, which they consider more natural, we would thus expect that the order (1a)/(2a) would make it two times more likely that the example be picked. It turns out that the difference between (1) and (2) is not significant (H0 cannot be rejected), according to a GLMM (estimate: -0.33, p > 0.1). The question remains whether the experiment had enough power to find a sufficiently large effect. Following the logic sketched above, and using simr::powerSim and simr::powerCurve, I have changed the estimate from -0.33 to -0.65, which amounts to an (inverted) odds ratio of 1.91, so even below the threshold of 2 proposed above. powerCurve shows a power of 85 % (CI: 79.28-89.65) for the pertinent factor. Given that this is based on a very conservative effect size (< 2), the experiment surely will have enough power to detect larger effect sizes. My (rather general) question now is: is the logic of proposing a rather low effect size sound? Are there general assumptions around about the expected effects in case of judgement studies? The interesting thing here is (or seems to me) that the linguistic argumentation leads one to hope for a non-rejection of H0, so non-significance is a result, and needs corroboration. With kind regards Tibor ——————————————————— Prof. Dr. Tibor Kiss Sprachwissenschaftliches Institut Ruhr-Universität Bochum