Re: [R] Regression line limited by the range of values

2006-05-24 Thread Andreas Svensson
Thankyou very much Marc for that nifty little script.

When I use it on my real dataset though, the lines are fat in the middle 
and thinner towards the ends. I guess it's because lines draw one 
fitted line for each x, and if you have hundreds of x, this turns into a 
line that is thicker that it should be (due to rounding errors perhaps).

I got the tip to use segments, and draw one line from min(x),min(y) to 
max(x),max(y) but with real data with a bunch of na.rm and 
na.actions this becomes very long and bulky.

For a regression with two data swarms and two trend lines it becomes 
this long mess:

plot(TCgonad[Period==1], 
ABelly[Period==1],xlim=c(0,20),ylim=c(130,160), col=”blue”)
points(TCgonad[Period==2], ABelly[Period==2],col=”red”)

segments(
min(TCgonad[Period==1],na.rm=T), 
min(fitted(lm(ABelly[Period==1]~TCgonad[Period==1],
na.action=na.exclude)),na.rm=T),
max(TCgonad[Period==1],na.rm=T), 
max(fitted(lm(ABelly[Period==1]~TCgonad[Period==1],
na.action=na.exclude)),na.rm=T),col=”blue”)

segments(
min(TCgonad[Period==2],na.rm=T), 
min(fitted(lm(ABelly[Period==2]~TCgonad[Period==2],
na.action=na.exclude)),na.rm=T),
max(TCgonad[Period==2],na.rm=T), 
max(fitted(lm(ABelly[Period==2]~TCgonad[Period==2],
na.action=na.exclude)),na.rm=T),col=”red”)


I just think it's strange that abline has as a nonadjustable default to 
extrapolate the line to outside the data - a mortal sin in my field.

Cheers
Andreas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Regression line limited by the range of values

2006-05-24 Thread Marc Schwartz (via MN)
On Wed, 2006-05-24 at 21:53 +0200, Andreas Svensson wrote:
 Thankyou very much Marc for that nifty little script.
 
 When I use it on my real dataset though, the lines are fat in the middle 
 and thinner towards the ends. I guess it's because lines draw one 
 fitted line for each x, and if you have hundreds of x, this turns into a 
 line that is thicker that it should be (due to rounding errors perhaps).
 
 I got the tip to use segments, and draw one line from min(x),min(y) to 
 max(x),max(y) but with real data with a bunch of na.rm and 
 na.actions this becomes very long and bulky.
 
 For a regression with two data swarms and two trend lines it becomes 
 this long mess:
 
 plot(TCgonad[Period==1], 
 ABelly[Period==1],xlim=c(0,20),ylim=c(130,160), col=”blue”)
 points(TCgonad[Period==2], ABelly[Period==2],col=”red”)
 
 segments(
 min(TCgonad[Period==1],na.rm=T), 
 min(fitted(lm(ABelly[Period==1]~TCgonad[Period==1],
 na.action=na.exclude)),na.rm=T),
 max(TCgonad[Period==1],na.rm=T), 
 max(fitted(lm(ABelly[Period==1]~TCgonad[Period==1],
 na.action=na.exclude)),na.rm=T),col=”blue”)
 
 segments(
 min(TCgonad[Period==2],na.rm=T), 
 min(fitted(lm(ABelly[Period==2]~TCgonad[Period==2],
 na.action=na.exclude)),na.rm=T),
 max(TCgonad[Period==2],na.rm=T), 
 max(fitted(lm(ABelly[Period==2]~TCgonad[Period==2],
 na.action=na.exclude)),na.rm=T),col=”red”)
 
 
 I just think it's strange that abline has as a nonadjustable default to 
 extrapolate the line to outside the data - a mortal sin in my field.
 
 Cheers
 Andreas


Andreas,

What is likely happening is that your x values are not sorted in
increasing order as they were in the dummy data set, thus there is
probably some movement of the lines back and forth from one x value to
the next in the order that they appear in the data set.

Here is a more generic approach that should work. It is based upon the
examples in ?predict.lm. We create the two models and then use range(x?)
for the new min,max x values to be used for the prediction of the fitted
y values.

Thus:

# Create model1 using random data
set.seed(1)
x1 - rnorm(15)
y1 - x1 + rnorm(15)
model1 - lm(y1 ~ x1)

# Create model2 using random data
set.seed(2)
x2 - rnorm(15)
y2 - x2 + rnorm(15)
model2 - lm(y2 ~ x2)


# Plot the first set of points
# set the axis ranges based upon the common ranges of the 
# two sets of data
plot(x1, y1, col = red, xlim = range(x1, x2),
 ylim = range(y1, y2))

# Add the second set of points
points(x2, y2, col = blue)

# Create the first fitted line
# Create two x values for the prediction based upon the 
# range of x1
lines(range(x1), predict(model1, data.frame(x1 = range(x1))), 
  col = red)

# Create the second fitted line
# Same here for the x2 values
lines(range(x2), predict(model2, data.frame(x2 = range(x2))), 
  col = blue)


Note that in both cases for the fitted lines, the lines will be
constrained by the range of the actual x? data values.

This should be easier to apply in general.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html