Re: [Rd] Wish R Core had a standard format (or generic function) for newdata objects

2011-04-28 Thread Greg Snow
Another way to see your plots is the TkPredict function in the TeachingDemos 
package.  It will default the variables to their medians for numeric predictors 
and baseline level for factors, but then you can set all of those to something 
more meaningful one time using the controls, then cycle through the predictors 
for the plots.  It can also give you a command line version of the commands 
that you could then run, or loop through to get your plots.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-bounces@r-
 project.org] On Behalf Of Paul Johnson
 Sent: Wednesday, April 27, 2011 10:20 AM
 To: Duncan Murdoch
 Cc: R Devel List
 Subject: Re: [Rd] Wish R Core had a standard format (or generic
 function) for newdata objects
 
 On Tue, Apr 26, 2011 at 7:39 PM, Duncan Murdoch
 murdoch.dun...@gmail.com wrote:
 
  If you don't like the way this was done in my three lines above, or
 by Frank
  Harrell, or the Zelig group, or John Fox, why don't you do it
 yourself, and
  get it right this time?  It's pretty rude to complain about things
 that
  others have given you for free, and demand they do it better.
 
  Duncan Murdoch
 
 
 I offer sincere apology for sounding that way.  I'm not attacking
 anybody. I'm just talking, asking don't you agree this were
 standardized.  And you disagree, and I respect that since you are
 actually doing the work.
 
 From a lowly user's point of view, I wish you experts out there
 would tell us one way to do this, we could follow your example.
 
 When there's a regression model fitted with 20 variables in it, and
 half of them are numeric, 4 are unordered factors, 3 are ordinal
 factors, and what not, then this is a hard problem for many of us
 ordinary users.  Or it is tedious.  They want keep everything fixed,
 except one variable that takes on different specified values.  And
 they want to do that for every variable, one at a time.
 
 Stata has made this easy for many models, R could as well, if we
 coalesced on a more-or-less standard way to create newdata objects for
 predict.
 
 But, in the end, I agree with your sentiment.  I just have to do this,
 show you it is handy.  I think Zelig's setx has it about right, I'll
 pursue that strategy.
 
 pj
 --
 Paul E. Johnson
 Professor, Political Science
 1541 Lilac Lane, Room 504
 University of Kansas
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish R Core had a standard format (or generic function) for newdata objects

2011-04-27 Thread peter dalgaard

On Apr 27, 2011, at 02:39 , Duncan Murdoch wrote:

 On 26/04/2011 11:13 AM, Paul Johnson wrote:
 Is anybody working on a way to standardize the creation of newdata
 objects for predict methods?

[snip]

 I think it is time the R Core Team would look at this tell us what
 is the right way to do this. I think the interface to setx in Zelig is
 pretty easy to understand, at least for numeric variables.
 
 If you don't like the way this was done in my three lines above, or by Frank 
 Harrell, or the Zelig group, or John Fox, why don't you do it yourself, and 
 get it right this time?  It's pretty rude to complain about things that 
 others have given you for free, and demand they do it better.

Er... No, I don't think Paul is being particularly rude here (and he has been 
doing us some substantial favors in the past, notably his useful Rtips page). I 
know the kind of functionality he is looking for; e.g., SAS JMP has some rather 
nice interactive displays of regression effects for which you'll need to fill 
in something for the other variables. 

However, that being said, I agree with Duncan that we probably do not want to 
canonicalize any particular method of filling in average values for data 
frame variables. Whatever you do will be statistically dubious (in particular, 
using the mode of a factor variable gives me the creeps: Do a subgroup analysis 
and your average person switches from male to female?), so I think it is one 
of those cases where it is best to provide mechanism, not policy.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish R Core had a standard format (or generic function) for newdata objects

2011-04-27 Thread Gabor Grothendieck
On Wed, Apr 27, 2011 at 3:55 AM, peter dalgaard pda...@gmail.com wrote:

 On Apr 27, 2011, at 02:39 , Duncan Murdoch wrote:

 On 26/04/2011 11:13 AM, Paul Johnson wrote:
 Is anybody working on a way to standardize the creation of newdata
 objects for predict methods?

 [snip]

 I think it is time the R Core Team would look at this tell us what
 is the right way to do this. I think the interface to setx in Zelig is
 pretty easy to understand, at least for numeric variables.

 If you don't like the way this was done in my three lines above, or by Frank 
 Harrell, or the Zelig group, or John Fox, why don't you do it yourself, and 
 get it right this time?  It's pretty rude to complain about things that 
 others have given you for free, and demand they do it better.

 Er... No, I don't think Paul is being particularly rude here (and he has been 
 doing us some substantial favors in the past, notably his useful Rtips page). 
 I know the kind of functionality he is looking for; e.g., SAS JMP has some 
 rather nice interactive displays of regression effects for which you'll need 
 to fill in something for the other variables.

 However, that being said, I agree with Duncan that we probably do not want to 
 canonicalize any particular method of filling in average values for data 
 frame variables. Whatever you do will be statistically dubious (in 
 particular, using the mode of a factor variable gives me the creeps: Do a 
 subgroup analysis and your average person switches from male to female?), 
 so I think it is one of those cases where it is best to provide mechanism, 
 not policy.


That could be satisfied by defining a generic in the core of R without
any methods.  Then individual packages or analyses could provide those
in the way they see fit.  As long as the packages or analyses are
working with objects of different classes they would not conflict.


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish R Core had a standard format (or generic function) for newdata objects

2011-04-27 Thread Christophe Dutang
Among many solutions, I generally use the following code, which avoids the
ideal average individual, by considering the mean across of the predicted
values:

averagingpredict - function(model, varname, varseq, type, subset=NULL)
{
if(is.null(subset))
mydata - model$data
else
mydata - model$data[subset, ]

f - function(x)
{
mydata[, varname] - x
mean(predict(model, newdata=mydata, type=type), na.rm=TRUE)
}

sapply(varseq, f)
}

It is time consuming, but it deals with non numeric variables.


Christophe


2011/4/26 Paul Johnson pauljoh...@gmail.com

 Is anybody working on a way to standardize the creation of newdata
 objects for predict methods?

 When using predict, I find it difficult/tedious to create newdata data
 frames when there are many variables. It is necessary to set all
 variables at the mean/mode/median, and then for some variables of
 interest, one has to insert values for which predictions are desired.
 I was at a presentation by Scott Long last week and he was discussing
 the increasing emphasis in Stata on calculations of marginal
 predictions and Spost an several other packages, and,
 co-incidentally, I had a student visit who is learning to use R MASS's
 polr (W.Venables and B. Ripley) and we wrestled for quite a while to
 try to make the same calculations that Stata makes automatically.  It
 spits out predicted probabilities each independent variable, keeping
 other variables at a reference level.

 I've found R packages that aim to do essentially the same thing.

 In Frank Harrell's Design/rms framework, he uses a data.dist
 function that generates an object that the user has to put into the R
 options.  I think many users trip over the use of options there.  If
 I don't use that for a month or two, I completely forget the fine
 points and have to fight with it.  But it does work to give plots
 and predict functions the information they require.

 In  Zelig ( by Kosuke Imai, Gary King, and Olivia Lau), a function
 setx does the work of creating newdata objects. That appears to be
 about right as a candidate for a generic newdata function. Perhaps
 it could directly generalize to all R regression functions, but right
 now it is tailored to the models in Zelig. It has separate methods for
 the different types of models, and that is a bit confusing to me,since
 the newdata in one model should be the same as the newdata in
 another, I'm guessing. But his code is all there, I'll keep looking.

 In Effects (by John Fox), there are internal functions to create
 newdata and plot the marginal effects. If you load effects and run,
 for example, effects:::effect.lm you see Prof Fox has his own way of
 grabbing information from model columns and calculating predictions.

 I think it is time the R Core Team would look at this tell us what
 is the right way to do this. I think the interface to setx in Zelig is
 pretty easy to understand, at least for numeric variables.

 In R's termplot function, such a thing could be put to use.  As far as
 I can tell now, termplot is doing most of the work of creating a
 newdata object, but not exactly.

 It seems like it would be a shame to proliferate more functions that
 do the same function, when it is such a common thing.

 --
 Paul E. Johnson
 Professor, Political Science
 1541 Lilac Lane, Room 504
 University of Kansas

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Christophe DUTANG
Ph. D. student at ISFA, Lyon, France

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Wish R Core had a standard format (or generic function) for newdata objects

2011-04-26 Thread Paul Johnson
Is anybody working on a way to standardize the creation of newdata
objects for predict methods?

When using predict, I find it difficult/tedious to create newdata data
frames when there are many variables. It is necessary to set all
variables at the mean/mode/median, and then for some variables of
interest, one has to insert values for which predictions are desired.
I was at a presentation by Scott Long last week and he was discussing
the increasing emphasis in Stata on calculations of marginal
predictions and Spost an several other packages, and,
co-incidentally, I had a student visit who is learning to use R MASS's
polr (W.Venables and B. Ripley) and we wrestled for quite a while to
try to make the same calculations that Stata makes automatically.  It
spits out predicted probabilities each independent variable, keeping
other variables at a reference level.

I've found R packages that aim to do essentially the same thing.

In Frank Harrell's Design/rms framework, he uses a data.dist
function that generates an object that the user has to put into the R
options.  I think many users trip over the use of options there.  If
I don't use that for a month or two, I completely forget the fine
points and have to fight with it.  But it does work to give plots
and predict functions the information they require.

In  Zelig ( by Kosuke Imai, Gary King, and Olivia Lau), a function
setx does the work of creating newdata objects. That appears to be
about right as a candidate for a generic newdata function. Perhaps
it could directly generalize to all R regression functions, but right
now it is tailored to the models in Zelig. It has separate methods for
the different types of models, and that is a bit confusing to me,since
the newdata in one model should be the same as the newdata in
another, I'm guessing. But his code is all there, I'll keep looking.

In Effects (by John Fox), there are internal functions to create
newdata and plot the marginal effects. If you load effects and run,
for example, effects:::effect.lm you see Prof Fox has his own way of
grabbing information from model columns and calculating predictions.

I think it is time the R Core Team would look at this tell us what
is the right way to do this. I think the interface to setx in Zelig is
pretty easy to understand, at least for numeric variables.

In R's termplot function, such a thing could be put to use.  As far as
I can tell now, termplot is doing most of the work of creating a
newdata object, but not exactly.

It seems like it would be a shame to proliferate more functions that
do the same function, when it is such a common thing.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish R Core had a standard format (or generic function) for newdata objects

2011-04-26 Thread Duncan Murdoch

On 26/04/2011 11:13 AM, Paul Johnson wrote:

Is anybody working on a way to standardize the creation of newdata
objects for predict methods?


They're generally just dataframes.  Use the data.frame() function.


When using predict, I find it difficult/tedious to create newdata data
frames when there are many variables. It is necessary to set all
variables at the mean/mode/median, and then for some variables of
interest, one has to insert values for which predictions are desired.


In most models, all variables are necessary in order to produce 
predictions.  If you want to do predictions for one variable, holding 
the others at particular fixed values, just create a dataframe.


For example, suppose the original data is

X - data.frame(a=rnorm(100), b=rnorm(100), c=rnorm(100))
y - with(X, a + 2*b + 3*c + rnorm(100))

# You use lm() to get a fit:

fit - lm(y ~ ., data=X)

# Compute the means of all the covariates:

means - lapply(X, mean)

# Replace a by a range of values from -1 to 1:

means$a - seq(-1, 1, len=11)

# Convert to a data.frame:

newdata - as.data.frame(means)

# Do the predictions:

predict(fit, newdata=newdata)


That was three lines of code to produce the newdata dataframe.  It's not 
that hard.  I would think it's easier to write those lines than to 
specify how to do this in general.



I was at a presentation by Scott Long last week and he was discussing
the increasing emphasis in Stata on calculations of marginal
predictions and Spost an several other packages, and,
co-incidentally, I had a student visit who is learning to use R MASS's
polr (W.Venables and B. Ripley) and we wrestled for quite a while to
try to make the same calculations that Stata makes automatically.  It
spits out predicted probabilities each independent variable, keeping
other variables at a reference level.

I've found R packages that aim to do essentially the same thing.

In Frank Harrell's Design/rms framework, he uses a data.dist
function that generates an object that the user has to put into the R
options.  I think many users trip over the use of options there.  If
I don't use that for a month or two, I completely forget the fine
points and have to fight with it.  But it does work to give plots
and predict functions the information they require.

In  Zelig ( by Kosuke Imai, Gary King, and Olivia Lau), a function
setx does the work of creating newdata objects. That appears to be
about right as a candidate for a generic newdata function. Perhaps
it could directly generalize to all R regression functions, but right
now it is tailored to the models in Zelig. It has separate methods for
the different types of models, and that is a bit confusing to me,since
the newdata in one model should be the same as the newdata in
another, I'm guessing. But his code is all there, I'll keep looking.

In Effects (by John Fox), there are internal functions to create
newdata and plot the marginal effects. If you load effects and run,
for example, effects:::effect.lm you see Prof Fox has his own way of
grabbing information from model columns and calculating predictions.

I think it is time the R Core Team would look at this tell us what
is the right way to do this. I think the interface to setx in Zelig is
pretty easy to understand, at least for numeric variables.


If you don't like the way this was done in my three lines above, or by 
Frank Harrell, or the Zelig group, or John Fox, why don't you do it 
yourself, and get it right this time?  It's pretty rude to complain 
about things that others have given you for free, and demand they do it 
better.


Duncan Murdoch




In R's termplot function, such a thing could be put to use.  As far as
I can tell now, termplot is doing most of the work of creating a
newdata object, but not exactly.

It seems like it would be a shame to proliferate more functions that
do the same function, when it is such a common thing.



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel