With regard to interpretation: This is an example of the „Simpsons’s paradox“ (https://en.wikipedia.org/wiki/Simpson%27s_paradox<https://en.wikipedia.org/wiki/Simpson's_paradox>)
While there appears to be a strong correlation between x and y, the data rather suggest that there is an underlying dichotomous grouping factor that better explains the pattern. If you control for this grouping factor, for example, in a linear regression model, the apparent correlation between x and y disappears (or, as in some of the Simpson's paradox examples, it can even reverse). Code to confirm this: Xaxis=c(1,3,3,5,6,8,85,87,90,92,97,98) Yaxis=c(2,10,8,4,12,2,85,80,94,82,80,87) Data = data.frame(Xaxis,Yaxis) # create grouping factor Data$group <- factor(rep(c("A","B"), each = 6)) # simple regression mod1 <- lm(Yaxis ~ Xaxis, data = Data) summary(mod1) # multiple regression controlling for group mod2 <- lm(Yaxis ~ Xaxis + group, data = Data) # Xaxis is now clearly n.s. summary(mod2) Am 21.03.2016 um 18:19 schrieb Nathaniel Smith <n...@pobox.com<mailto:n...@pobox.com>>: I'm not aware of any particular problems with computing a correlation coefficient for that kind of data, but it does seem a little bit odd because your two axes are effectively categorical, and for human consumption it might be more informative to frame the results that way instead of (or in addition to) using linear corrosion. On Mar 21, 2016 7:15 AM, "Ambridge, Ben" <ben.ambri...@liverpool.ac.uk<mailto:ben.ambri...@liverpool.ac.uk>> wrote: Hi Is anyone able to advise on the following- probably very naive - question? Is it problematic to run/interpret a correlation that is driven by extreme values like this? Xaxis=c(1,3,3,5,6,8,85,87,90,92,97,98) Yaxis=c(2,10,8,4,12,2,85,80,94,82,80,87) Data = data.frame(Xaxis,Yaxis) plot(Data$Xaxis, Data$Yaxis) And if it is problematic, would lme models (where these datapoints represent - for example - the by-item means) also inherit the same problems? Thanks Ben <Rplot.jpeg>