Hi everybody,

here is another try on effect sizes, this time from the perspective of power 
calculations. 

I hope that the following characterization is sufficient: In Germanic 
languages, part of the VP can be topicalized, leaving other parts behind, such 
as illustrated in (1):

(1)     a.      Ohne einen Lappen gespült hat sie in einer Kaffeepause.
                without a rag rinsed has she during a coffee break
                ’She rinsed without a rag during a coffee break.’
        b.      In einer Kaffeepause gespült hat sie ohne einen Lappen.
                (same as (1a) only order of temporal and instrumental PP 
reversed)

There are some claims in the literature (as e.g. Frey and Pittner 1998, but not 
directed towards the specific examples here) that speakers should judge (1a) 
significantly better than (1b). There is also, a more general, almost tacit 
claim that examples like (1a) are derived from verbfinal structures that are 
structurally identical, and moreover that a reversal of the PPs in the 
verbfinal clause is ok. So, both (2a) and (2b) are supposed to be fine. 

(2)     a.      … dass sie in einer Kaffeepause   ohne einen Lappen gespült hat.
                     that  she during a coffee break without a     rag        
rinsed has
        b.      … dass sie [ohne einen Lappen]_i in einer Kaffeepause t_i 
gespült hat.
                (same as (2a) with scrambling of the PP) 

But it would be impossible to have a derivation where you first have a reversal 
of the PPs and then a partial VP topicalization, because the topicalized phrase 
contains a trace (or, in more contemporary parlance: a copy), which cannot be 
linked to the (scrambled) antecedent. So, only (2a) could be an input to yield 
(1a), while (2b) would be an input to yield (1b), but then the trace is lost … 
[It is not relevant for the present purposes that there might be alternative 
analyses which do not make use of scrambling/traces/copies.]

This is all classical generative linguistics, but here comes my question. We 
can translate all this into two hypotheses, H0 stating that there is no 
difference between structures of type (1) and structures of type (2) in terms 
of acceptability judgments, and HA stating that there is. The question is: what 
should be the minimal effect size that we accept in this case? I dare say that 
a conservative guess (the least we should get) would be something like 2. In 
terms of a two-alternative-forced-choice study (Sprouse et al. 2013), where 
subjects have to pick one of two sentences presented, which they consider more 
natural, we would thus expect that the order (1a)/(2a) would make it two times 
more likely that the example be picked. 

It turns out that the difference between (1) and (2) is not significant (H0 
cannot be rejected), according to a GLMM (estimate: -0.33, p > 0.1). The 
question remains whether the experiment had enough power to find a sufficiently 
large effect. Following the logic sketched above, and using simr::powerSim and 
simr::powerCurve, I have changed the estimate from -0.33 to -0.65, which 
amounts to an (inverted) odds ratio of 1.91, so even below the threshold of 2 
proposed above. powerCurve shows a power of 85 % (CI: 79.28-89.65) for the 
pertinent factor. Given that this is based on a very conservative effect size 
(< 2), the experiment surely will have enough power to detect larger effect 
sizes. 

My (rather general) question now is: is the logic of proposing a rather low 
effect size sound? Are there general assumptions around about the expected 
effects in case of judgement studies? The interesting thing here is (or seems 
to me) that the linguistic argumentation leads one to hope for a non-rejection 
of H0, so non-significance is a result, and needs corroboration. 

With kind regards


Tibor


———————————————————
Prof. Dr. Tibor Kiss
Sprachwissenschaftliches Institut
Ruhr-Universität Bochum

Reply via email to