[R] Generic functions with models? 'predict', 'residuals', 'fitted', ...?

2009-02-03 Thread John McHenry
Hi WizaRds!

I don't know if my understanding of models is screwed up or
if there's a fundamental limitation to the way they are 
used in R.

Here's what I'd like to do: create a model and then test data
against that model using generics like 'predict' and 'residuals'.
The problem is that generics don't work for new data sets, except
of course 'predict': 

set.seed(1)
df- data.frame(matrix(rnorm(10*4), nc=4))
cn-  c(y, x1, x2, x3)
colnames(df)- cn
model- lm(y ~ x1 + x2 + x3 + 1, data=df)
u- residuals(model, data=df)

test- data.frame(matrix(rnorm(10*4), nc=4))
colnames(test)- cn
y.hat- predict(model, newdata=test)
u.new- residuals(model, newdata=test)

# I would expect to get the residuals to the 'test' 
# data fit to 'model' with 'residuals(model, newdata=test)',
# alas no:
cbind(u, u.new)
  uu.new
1   0.032996295  0.032996295
2   0.575261650  0.575261650
3  -0.702501425 -0.702501425
4   0.482075538  0.482075538
5   0.300600839  0.300600839
6  -0.980832577 -0.980832577
7   0.265239680  0.265239680
8  -0.341625071 -0.341625071
9   0.363224527  0.363224527
10  0.005560545  0.005560545

# I would expect to get the 'test' fit to 'model'
# with 'fitted(model, newdata=test)' but again no:
cbind(fitted(model), fitted(model, newdata=test))
  [,1][,2]
1  -0.65945011 -0.65945011
2  -0.39161833 -0.39161833
3  -0.13312719 -0.13312719
4   1.11320526  1.11320526
5   0.02890693  0.02890693
6   0.16036419  0.16036419
7   0.22218937  0.22218937
8   1.07994978  1.07994978
9   0.21255683  0.21255683
10 -0.31094893 -0.31094893

I have to do:

y.hat- predict(model, newdata=test)
# get test residuals:
u.new- test$y - y.hat
u.new
 1  2  3  4  5  6  7 
 1.3661751 -0.4077999  1.1740108  0.4492772 -1.5865168 -0.7633108 -0.8800892 
 8  9 10 
 1.7474658 -0.1016104  2.1081361 
 

to get the residuals. Maybe this is what one is supposed to do?

To be fair, the help pages for e.g. 'residuals.lm'
don't say that you can do things like 'newdata=test' in:

'residuals(model, newdata=test)' 

or 'data=test' in:

'residuals(model, data=test)' 

but that is exactly what I would like to do!

Am I missing something here, or is this just the way it is?

Thanks guys!

Jack.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plotting NAs

2008-03-08 Thread John McHenry
Hi WizaRds,

(I've cross-posted to r-sig-finance because a lot of people there use 'zoo' 
objects).

I'm trying to plot 2 time series lined up with par(mfrow=c(2,1)), so that the 
time index is the same for the two series. The data have monthly frequency. 

An example will illustrate:

# data for these dates (monthly frequency) do not have NAs:
require(zoo)
monthly.dates- as.Date(c(
1991-01-31,
1991-02-28,
1991-03-31,
1991-04-30,
1991-05-30
))
monthly.only.data- 1:length(monthly.dates)
# data for these extra dates (daily frequency) do have NAs interspersed with
# non-NA data having monthly frequency:
daily.dates- seq(as.Date(1991-06-01), as.Date(1991-07-31), by=1)
tmp- zoo(NA, order.by=c(monthly.dates, daily.dates))
z- cbind(tmp, tmp)
colnames(z)- NULL
monthly.in.daily.data- (1:2)/10
z[index(z) %in% monthly.dates, 1]- monthly.only.data
z[index(z) %in% as.Date(c(1991-06-30,1991-07-31)), 2]- 
monthly.in.daily.data

# this creates a block of data like that listed below.

Here's my problem: I want to plot lines through all non-NA data.
This is straight forward with column 1, but not with column 2.
Why? Because the monthly-frequency data in column 2 have NAs interspersed.

plot(z, type=l) 

does not plot a line through the points in column 2.
(It's not a problem with 'plot.zoo'; same thing happens if you 
use 'plot' on regular data like:
plot(c(1, NA, 2, 3, NA, 4), type=l)

The following works in so far as it will plot the correct 
picture, 

tmp- zoo(NA, seq(min(index(z)), max(index(z)), by=1))
z- cbind(tmp, z); z- z[,-1]
plot(na.omit(z), type=l)
plot(z[!apply(is.na(z), 1, sum)==2,], type=l)

but for various reasons too long to go into here
I want to know how to plot lines through the non-NA data in
objects like:
c(1, NA, 2, 3, NA, 4) 
without having to strip out the NAs with the !apply(is.na(z), 1, sum)==2
hack.

What do I do???

Thanks guys!

Jack.

# p.s. here's that 'zoo' data:
 z
1991-01-31  1  NA
1991-02-28  2  NA
1991-03-31  3  NA
1991-04-30  4  NA
1991-05-30  5  NA
1991-06-01 NA  NA
1991-06-02 NA  NA
1991-06-03 NA  NA
1991-06-04 NA  NA
1991-06-05 NA  NA
1991-06-06 NA  NA
1991-06-07 NA  NA
1991-06-08 NA  NA
1991-06-09 NA  NA
1991-06-10 NA  NA
1991-06-11 NA  NA
1991-06-12 NA  NA
1991-06-13 NA  NA
1991-06-14 NA  NA
1991-06-15 NA  NA
1991-06-16 NA  NA
1991-06-17 NA  NA
1991-06-18 NA  NA
1991-06-19 NA  NA
1991-06-20 NA  NA
1991-06-21 NA  NA
1991-06-22 NA  NA
1991-06-23 NA  NA
1991-06-24 NA  NA
1991-06-25 NA  NA
1991-06-26 NA  NA
1991-06-27 NA  NA
1991-06-28 NA  NA
1991-06-29 NA  NA
1991-06-30 NA 0.1
1991-07-01 NA  NA
1991-07-02 NA  NA
1991-07-03 NA  NA
1991-07-04 NA  NA
1991-07-05 NA  NA
1991-07-06 NA  NA
1991-07-07 NA  NA
1991-07-08 NA  NA
1991-07-09 NA  NA
1991-07-10 NA  NA
1991-07-11 NA  NA
1991-07-12 NA  NA
1991-07-13 NA  NA
1991-07-14 NA  NA
1991-07-15 NA  NA
1991-07-16 NA  NA
1991-07-17 NA  NA
1991-07-18 NA  NA
1991-07-19 NA  NA
1991-07-20 NA  NA
1991-07-21 NA  NA
1991-07-22 NA  NA
1991-07-23 NA  NA
1991-07-24 NA  NA
1991-07-25 NA  NA
1991-07-26 NA  NA
1991-07-27 NA  NA
1991-07-28 NA  NA
1991-07-29 NA  NA
1991-07-30 NA  NA
1991-07-31 NA 0.2




   
-

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RODBC and Oracle

2007-09-28 Thread John McHenry
Hi WizaRds,

I'm experiencing a problem connecting to an Oracle 10g database via RODBC
(I'm getting this on Microsoft XP).

The same SQL queries via PL/SQL Developer work just fine, but when I pump the 
query through sqlQuery in RODBC then I get a data frame back with 0 rows. 
I cut the query down alternating between PL/SQL and RODBC until I figured 
that it's some kind of row limit or buffer limit thing going on.

I searched the archives and Brian Ripley suggested playing with believeNRows
parameter (setting it to FALSE) as Oracle is not well behaved. I've tried 
various 
combinations of this and delving into the lower level functions like odbcQuery
and sqlGetResults but I'm still getting the same thing: 0 rows back unless 
I cut the query down to a more return a more reasonable range.

Obviously I can't reproduce the problem here because of the database, but does
anyone have any suggestions/tips/pointers to get me going toward a solution?
Maybe it's a settings problem in the Microsoft ODBC thing under 
Control Panel-Administrative Tools-Data Sources
Any suggestions on this? 

All help gladly received.

Thanks!

Jack.

platform   i386-pc-mingw32 
arch   i386
os mingw32 
system i386, mingw32   
status 
major  2   
minor  5.1 
year   2007
month  06  
day27  
svn rev42083   
language   R   
version.string R version 2.5.1 (2007-06-27)


   
-

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multivariate, multilevel regression?

2007-09-12 Thread John McHenry
Dear WizaRds,

This is mostly a statistics question, but I'm figuring that R is the right 
solution (even before I start!)

I have some bio data of heart rate versus time (rats taken from resting to 
maximal heart rate). I want to regress heart rate on time. The data have been 
normalized such that resting heart rate is zero at time=0, so that all curves 
intersect at the origin (and at the origin only). The regression function is 
monotonically increasing, but little is known about it.

The data are also strictly ordinal: I have a factor such that lower-order data 
*must* be on a curve that is offset beneath a higher-order curve. Don't ask 
where the factors come from...but *given* these factors the assumption, or 
rather the *constraint*, is that lower orders are better (lower-order rats are 
fitter rats with better cardiovascular response). Most important is that these 
curves do not intersect because of these factors (a fitter rat can not have a 
worse heart-rate 
response than a less fit rat!).

Here's a schematic to show you what I mean. It's very rough, but it gets the 
point across:

seq(0, 1, len=100)
f- 1/seq(.3,1,len=6)
windows(); 
plot(t, sqrt(3*t), type='n', xlab='time', ylab='heart rate'); 
grid()
for (i in seq(along=f)) 
lines(t, sqrt(f[i]*t), col=ceiling(2*i))


Now I'm wondering where I should start and I'm think that this is really not 
much different from having a y_i ~ x_i | factor_i ... the i^th response curve 
just like a dummy variable male/female linear regression. But in some way the 
factors are related (there's a dose behind it, if ya see what I mean), so they 
are not independent...they're all part of a system (some studies have more 
juice overall, so the whole system is varying from one group of rats to the 
next).
This means that in some systems the curves will be closer or some curves 
within the system will be closer together.

Here is my question for you guys: what is the best way to model this kind of 
problem, especially given that each i^th curve could have as few as 3 data 
points? If I can only vaguely assume the type of the regression function 
(monotonic, rapidly increasing from the origin kind of like sqrt(t), as above)
should it be a parametric or nonparametric regression? What about those errors? 
Gee! It's hard to assume anything (there're so few of 'em and they're probably 
heteroskedastic!

Where do I begin? I'd gladly accept all references, pointers, and such like.

Best, and thanks for R and R-Help, all you Core guys!!!

Jack.

   
-

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.