With regard to interpretation: This is an example of the „Simpsons’s paradox“ 
(https://en.wikipedia.org/wiki/Simpson%27s_paradox<https://en.wikipedia.org/wiki/Simpson's_paradox>)

While there appears to be a strong correlation between x and y, the data rather 
suggest that there is an underlying dichotomous grouping factor that better 
explains the pattern. If you control for this grouping factor, for example, in 
a linear regression model, the apparent correlation between x and y disappears 
(or, as in some of the Simpson's paradox examples, it can even reverse).

Code to confirm this:

Xaxis=c(1,3,3,5,6,8,85,87,90,92,97,98)
Yaxis=c(2,10,8,4,12,2,85,80,94,82,80,87)
Data = data.frame(Xaxis,Yaxis)

# create grouping factor
Data$group <- factor(rep(c("A","B"), each = 6))

# simple regression
mod1 <- lm(Yaxis ~ Xaxis, data = Data)
summary(mod1)

# multiple regression controlling for group
mod2 <- lm(Yaxis ~ Xaxis + group, data = Data) # Xaxis is now clearly n.s.
summary(mod2)



Am 21.03.2016 um 18:19 schrieb Nathaniel Smith 
<n...@pobox.com<mailto:n...@pobox.com>>:


I'm not aware of any particular problems with computing a correlation 
coefficient for that kind of data, but it does seem a little bit odd because 
your two axes are effectively categorical, and for human consumption it might 
be more informative to frame the results that way instead of (or in addition 
to) using linear corrosion.

On Mar 21, 2016 7:15 AM, "Ambridge, Ben" 
<ben.ambri...@liverpool.ac.uk<mailto:ben.ambri...@liverpool.ac.uk>> wrote:
Hi

Is anyone able to advise on the following- probably very naive - question? Is 
it problematic to run/interpret a correlation that is driven by extreme values 
like this?

Xaxis=c(1,3,3,5,6,8,85,87,90,92,97,98)
Yaxis=c(2,10,8,4,12,2,85,80,94,82,80,87)
Data = data.frame(Xaxis,Yaxis)
plot(Data$Xaxis, Data$Yaxis)

And if it is problematic, would lme models (where these datapoints represent - 
for example - the by-item means) also inherit the same problems?

Thanks
Ben

<Rplot.jpeg>

Reply via email to