Re: [R] How many samples ACTUALLY used in regression?

Prof Brian Ripley Mon, 18 Mar 2013 08:09:26 -0700

On 18/03/2013 14:51, Cade, Brian wrote:

Perhaps a crude but reliable way is to check the number of residuals, e.g.,
length(my.model$resid).

Not very reliable (what about zero weights, for example?), and thecomponent is usually 'residuals'.


No one has so far mentioned nobs(), which seems to me to be the closest.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov <brian_c...@usgs.gov>
tel:  970 226-9326



On Mon, Mar 18, 2013 at 8:39 AM, Marc Schwartz <marc_schwa...@me.com> wrote:


On Mar 18, 2013, at 7:36 AM, Federico Calboli <f.calb...@imperial.ac.uk>
wrote:

Dear All,

is there a simple way that covers all regression models to extract the

number of samples from a data frame/matrix actually used in a regression
model?


For instance I might have a data of 100 rows and 4 colums (1 response +

3 explanatory variables).  If 3 samples have one or more NAs in the
explanatory variable columns these samples will be dropped in any model:


my.model = lm(y ~ x + w + z, my.data)
my.model = glm(y ~ x + w + z, my.data, family = binomial)
my.model = polr(y ~ x + w + z, my.data)
…

I don't seem to be able to find one single method that works in the

exact same way -- irrespective of the model type -- to interrogate my.model
to see how many samples of my.data were actually used.  Is there such
function or do I need to hack something together?


Best wishes

Federico



I don't know that this would be universal to all possible R model
implementations, but should work for those that at least abide by "certain
standards"[1] relative to the internal use of ?model.frame.

In the case where model functions use 'model = TRUE' as the default in
their call (eg. lm(),  glm() and MASS::polr()), the returned model object
will have a component called 'model', such that:

   nrow(my.model$model)

returns the number of rows in the internally created data frame.

Note that 'model = TRUE' is not the default for many functions, for
example Terry's coxph() in survival or Frank's lrm() in rms.

Note also that the value of 'na.action' in the modeling function call may
have a potential effect on whether the number of rows in the retained
'model' data frame is really the correct value.

You can also use model.frame(), independently matching arguments passed to
the model function, to replicate what takes place internally in many
modeling functions. The result of model.frame() will be a data frame,
again, subject to similar limitations as above.

Regards,

Marc Schwartz

[1]: http://developer.r-project.org/model-fitting-functions.txt

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How many samples ACTUALLY used in regression?

Reply via email to