Re: [R] SpatStat Kest - Error Message help

2005-09-04 Thread Adrian Baddeley
On Thu, 1 Sep 2005, DrakeGis wrote:

> Hi I'm working with the function Kest in the package SpatStat (under LINUX
> with R 2.1.0). In order to evaluate the statistical significance of my
> point pattern I'm doing 999 Montecarlo replications. The script that use
> the Kest function runs OK for most of the different point patterns that I
> have but for a particular point pattern, which have only 17 points, it
> runs until the 34th iteration and then I receive this message:
>
> Error in "[<-"(`*tmp*`, index, value = NULL) :
>   incompatible types (1000) in subassignment type fix
> Execution halted
>
> Do you have any idea about what could be the cause of this ? Thanks in
> advance.

  This is not an error message from 'spatstat' itself.

  The message has been generated by the function "[<-" 
  which is called when you assign values to a subset of a dataset 
  (in a command like x[z] <- v). The message appears to say that the
  replacement value v is not of the same type as the original vector x. 
  
  You say that you are running a "script that uses the Kest function".
  The error is probably inside that script. If you send the script to us
  we can probably spot the problem for you.

  As Rolf mentioned in his email, spatstat provides a command "envelope"
  to compute simulation envelopes. This might be sufficient for your needs.

regards
Adrian Baddeley

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] how to fit the partial association model with R?

2005-09-04 Thread David Duffy
On Fri, 2 Sep 2005 [EMAIL PROTECTED] wrote:

>
> If I do not make a mistake,the partial association model is an
> extension of log-linear model.I read a papers which gives an example
> of it.(Sloane and Morgan,1996,An Introduction to Categorical Data
> Analysis,Annual Review of Sociology.22:351-375) Can R fit such partial
> association model?
>
> ps:Another somewhat off-topic question.What's the motivations to use
> log-linear model?Or why use log-linear model?I learn the log-linear
> model butI still do not master the the advantage of the model.thank
> you!

You might like to read the manual at http://math.cl.uh.edu/thompsonla/
under  "An S Manual to Accompany Agresti's Categorical Data Analysis 2nd ed.
(2002)" along with Agresti's book itself.

| David Duffy (MBBS PhD) ,-_|\
| email: [EMAIL PROTECTED]  ph: INT+61+7+3362-0217 fax: -0101  / *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Question

2005-09-04 Thread Spencer Graves
  1.  I could find no references to "var.ran" with 
RSiteSearch("var.ran") and when requesting "var.ran" from S-Plus 6.2.

  2.  Have you considered "simulate.lme"?

  3.  Are you familiar with Pinheiro and Bates (2000) Mixed-Effects 
Models in S and S-Plus (Springer)?  I highly recommend this book.

  spencer graves

Mahmoud Torabi wrote:

> Dear Sir/Madam
> 
> I would be pleased if anybody can help me. I'm using linear mixed model
> (lme) function.I'm doing some simulation in my research and need to be
> assigned variance components values during of my program. Specifically,
> when I use lme function, I can get some information by use summary() and I
> can assign some valuse like variance of fixed parameters and variance of
> random error
> term by using for example  varFix and sigma.But I don't know how I can
> assign for variance of random effect.
> I know in SPLUS we have command var.ran, how about R ?
> 
> Thanks alot.
> M.Torabi
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com 
Tel:  408-938-4420
Fax: 408-280-7915

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R package for ICA

2005-09-04 Thread Spencer Graves
  I just got 48 hits from RSiteSearch("independent component 
analysis"), the first of which was 
"http://finzi.psych.upenn.edu/R/library/mlica/html/mlica.html";.

  hope this helps.
  spencer graves
p.s.  I believe people who follow the posting guide typically get more 
useful information quicker.  "http://www.R-project.org/posting-guide.html";

Li, Ran wrote:

> Hi,
> 
>  
> 
> Which R packages are good to be used for independent component analysis?  
> 
>  
> 
> Thanks for any suggestions.
> 
>  
> 
> Ran 
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com 
Tel:  408-938-4420
Fax: 408-280-7915

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Distributional characteristics

2005-09-04 Thread Spencer Graves
  I haven't seen a reply to your post, and I would like to help you. 
Unfortunately, I don't see a question in your email.  Please tell us why 
"changing the variable (x-1,ln(x)) didn't get satisfying results", 
preferably using a toy example that someone else can copy from your 
email in to R and test a couple if ideas in a matter of seconds.  I 
believe doing that will increase the chances that you will receive a 
quick and useful reply.  (And PLEASE do read the posting guide! 
"http://www.R-project.org/posting-guide.html";.  Many people have found 
it quite helpful.)

  spencer graves

Naji wrote:

> Hi all
> 
> I've a continuous variable and I want to test (graphically, plotting
> observed and theoretical distr, qqplot) whether it follows some formal
> distribution. (some thing close to Ricci document : Fitting distributions
> with R, Feb05).
>  
> The distribution I want to fit is a truncated Gamma at 1 (the minimal value
> is 1), P(x)=Pgamma(rate,x)/(1-Pgamma(rate,x<1))
> 
> NB : changing the variable (x-1,ln(x)) didn't get satisfying results
> 
> Best
> Naji
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com 
Tel:  408-938-4420
Fax: 408-280-7915

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Doubt about nested aov output

2005-09-04 Thread Spencer Graves
  Others may know the answer to your question, but I don't.  However, 
since I have not seen a reply, I will offer a few comments:

  1.  What version of R are you using?  I just tried superficially 
similar things with the examples in ?aov in R 2.1.1 patched and 
consistently got F and p values.

  2.  My preference for this kind of thing is to use lme in 
library(nlme) or lmer in library(lme4).  Also, I highly recommend 
Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer).

  3.  If still want to use aov and are getting this problem in R 2.1.1, 
could you please provide this list with a small, self contained example 
that displays the symptoms that concern you?  And PLEASE do read the 
posting guide! "http://www.R-project.org/posting-guide.html";.  It might 
increase the speed and utility of replies.

  spencer graves

Ronaldo Reis-Jr. wrote:

> Hi,
> 
> I have two doubts about the nested aov output.
> 
> 1) I have this:
> 
>>anova.ratos <- aov(Glicogenio~Tratamento+Error(Tratamento/Rato/Figado))
>>summary(anova.ratos)
> 
> 
> Error: Tratamento
>Df  Sum Sq Mean Sq
> Tratamento  2 1557.56  778.78
> 
> Error: Tratamento:Rato
>   Df Sum Sq Mean Sq F value Pr(>F)
> Residuals  3 797.67  265.89   
> 
> Error: Tratamento:Rato:Figado
>   Df Sum Sq Mean Sq F value Pr(>F)
> Residuals 12  594.049.5   
> 
> Error: Within
>   Df Sum Sq Mean Sq F value Pr(>F)
> Residuals 18 381.00   21.17   
> 
> R dont make the F and P automatically, it is possible?
> 
> I Like an output like this:
> 
> Error: Tratamento
>Df  Sum Sq Mean Sq F value Pr(>F)
> Tratamento  2 1557.56  778.78   2.929 0.197
> 
> Error: Tratamento:Rato
>   Df Sum Sq Mean Sq F value Pr(>F)
> Residuals  3 797.67  265.89   5.372 0.0141  
> 
> Error: Tratamento:Rato:Figado
>   Df Sum Sq Mean Sq F value Pr(>F)
> Residuals 12  594.049.5   2.339 0.0503
> 
> Error: Within
>   Df Sum Sq Mean Sq F value Pr(>F)
> Residuals 18 381.00   21.17 
> 
> Why it not make automatic calculations? It is possible?
> 
> 
> 2) I can say that Error: Tratamento:Rato means an interaction between 
> Tratamento and Rato? Normally the : represents an interaction, but in this 
> output I think that it dont mean the interaction. 
> 
> Any explanation are welcome.
> 
> Thanks
> Ronaldo

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com 
Tel:  408-938-4420
Fax: 408-280-7915

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R-square n p-value

2005-09-04 Thread Justin Rhodes
Dear Peter,

This is exactly what I needed.  The "input" is 
coming from the Mouse Phenome Project database 
(http://aretha.jax.org/pub-cgi/phenome/mpdcgi?rtn=docs/home) 
which only gives pearson's correlations r, and 
n.  Thank you very much.  I have used this R-help 
resource twice now recently and it is incredibly 
helpful and fast.  Thanks again.  It is much appreciated.

cc Dirk

Best,

Justin




At 03:43 AM 9/4/2005, Peter Dalgaard wrote:
>Dirk Eddelbuettel <[EMAIL PROTECTED]> writes:
>
> > On 3 September 2005 at 17:59, Justin Rhodes wrote:
> > | Dear R-help,
> > |
> > | Can someone please help me discover what function or code will give
> > | me a p-value from the input: 1) R-square statistic from a simple
> > | linear regression, and 2) sample size, n
> > |
> > | This would be greatly appreciated.  I need this because I am using a
> > | database that gives me R-square and sample size for multiple
> > | comparisons, and I wish to determine the false discovery rate using
> > | q-value.  Thanks again,
> >
> > Do
> >   > example(lm)   # just to get an lm object
> >   > str(summary(lm.D9))   # to examine summary of an object
> >
> > and you'll see that the object returned from summary has the two common R^2
> > measures, as well as things like residuals from which can compute n quite
> > easily -- which you could obviously also from 
> your regressors and regressand.
> >
> >   > length(summary(lm.D9)$residuals)
>
>I think the problem was somewhat different: The *input* is coming from
>some sort of (closed-source or otherwise impenetrable) database which
>only gives out n and R^2, right?
>
>Now R^2 = SSDmodel/(SSDmodel+SSDres) and F =
>DFres/DFmodel*SSDmodel/SSDres, i.e.
>
>   1/R^2 = 1 + 1/F*DFmodel/DFres
>
>or
>
>   F = 1/(1/R^2 - 1)*DFres/DFmodel = R^2/(1-R^2)*DFres/DFmodel
>
>which can be looked up "in the F-table" using
>
>   pf(F, 1, N-2, lower.tail=FALSE)
>
>(provided we have a 1 DF model)
>
>Actually, R^2 itself has a beta distribution and you could use pbeta
>directly, but then you'd need to figure out (or recall) what the
>relation between the DF and the shape parameters of the beta
>distribution are. By my reckoning, this should do it:
>
>   pbeta(Rsq, 1/2, (N-2)/2, lower.tail=FALSE)
>
>"Proof":
>
>
>Residual standard error: 1.143 on 8 degrees of freedom
>Multiple R-Squared: 0.0004207,  Adjusted R-squared: -0.1245
>F-statistic: 0.003367 on 1 and 8 DF,  p-value: 0.9552
>
> > pbeta(0.0004207, 1/2, 8/2, lower=F)
>[1] 0.9551511
>
>
>--
>O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
>   c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
>~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

Justin S. Rhodes
Assistant Professor of Psychology
Affiliate, Institute for Genomic Biology and Neuroscience Program
University of Illinois at Urbana-Champaign
405 N Mathews Ave, Urbana, Il, 61801
Tel. 217-265-0021 Fax 217-244-5876
Website: http://s.psych.uiuc.edu/people/showprofile.php?id=545

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Help: PLSR

2005-09-04 Thread Shengzhe Wu
Hello,

I have a data set with 15 variables (first one is the response) and
1200 observations. Now I use pls package to do the plsr as below.

trainSet = as.data.frame(scale(trainSet, center = T, scale = T))
trainSet.plsr = mvr(formula, ncomp = 14, data = trainSet, method = "kernelpls",
model = TRUE, x = TRUE, y = TRUE)

from the model, I wish to know the values of Xvar (the amount of
X-variance explained by each number of components) and Xtotvar (total
variance in X).

Because the trainSet has been scaled before training, I think Xtotvar
should be equal to 14, but unexpectedly Xtotvar = 16562, and the
values of Xvar are also very big and sum of Xvar = 16562. Why does
this type of result occur? for the reason of kernel algorithm?

Thank you,
Shengzhe

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] survey weights

2005-09-04 Thread A Das
That worked. Many thanks, Thomas. 
  -Bobby

--- Thomas Lumley <[EMAIL PROTECTED]> wrote:

> On Sun, 4 Sep 2005, A Das wrote:
> 
> > Just: "missing values in object". That would imply
> the
> > object was created. But then I write "dchina", and
> it
> > says "object dchina not found".
> 
> No, it would not imply the object was created.  If
> it was an error message 
> (rather than a warning) the object would not have
> been created.
> 
> I presume the full message was
>   Error in na.fail.default(object) : missing values
> in object
> 
> If so, it sounds as though you have missing values
> in the id, weights, or 
> strata variable.
> summary(China[,c("psu","stata","weight0x"])
> will verify this.
> 
> Stata will just have dropped these observations (use
> -svydes- to verify 
> this).  If you want to drop the observations in R
> you need to do this 
> explicitly. Having missing data may be unavoidable,
> but if you have 
> observations in a sample it seems that you should
> know how they were 
> sampled.
> To drop these observations you could use
> 
> obsChina <- subset(China, !is.na(psu) &
> !is.na(strata) & !is.na(weight0x))
> 
> and then use obsChina rather than China in the
> svydesign() function.
> 
>   -thomas
> 
> 
> 
> >  -Bobby
> >
> > --- Thomas Lumley <[EMAIL PROTECTED]>
> wrote:
> >
> >> On Sun, 4 Sep 2005, A Das wrote:
> >>
> >>> Thanks, Thomas.
> >>>Yes, that's exactly what happened: the
> warnings
> >>> came first after "data(China)", and then after
> >>> "dchina<-svydesign..." So the design object
> isn't
> >>> being produced? The dataset is very large, and
> the
> >>> weights were already set in Stata before
> >> importing.
> >>> Would either of those cause problems?
> >>
> >> Probably not.  What was the error message from
> >> svydesign()?  That is what
> >> will say what went wrong.
> >>
> >>-thomas
> >>
> >
> >
> >
> >
> >
> 
> > Start your day with Yahoo! - make it your home
> page
> > http://www.yahoo.com/r/hs
> >
> >
> 
> Thomas Lumley Assoc. Professor, Biostatistics
> [EMAIL PROTECTED] University of Washington,
> Seattle
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Question regarding lmer with binary response

2005-09-04 Thread Douglas Bates
On 9/4/05, Bernd Weiss <[EMAIL PROTECTED]> wrote:
> Dear all, dear Prof. Bates,
> 
> my dependent variable (school absenteeism, truancy[1]) is a binary
> response for which I am trying to compute an unconditional mixed
> effects model. I've got observations (monday, wednesday and friday)
> nested in individuals (ID2), which were nested in classes (KID2) and
> schools (SID), i.e. a 4-level mixed effects model.
> 
> In short, I was trying without success. I got no sensible results
> using lmer as well as using glmmPQL. I played around with the control
> parameters and the methods (PQl, Laplace) in lmer without any effect.
> 
> I would really appreciate if someone could have a look into my data
> and tell me what's going wrong here.
> 
> My R script and data can be found at:
> 
> http://www.metaanalyse.de/tmp/rhelp.R
> http://www.metaanalyse.de/tmp/rhelp.txt
> 
> TIA,
> 
> Bernd

Thanks for making the data and your script available.  That helps a
lot when investigating cases like these.

As you say, you have 3 binary responses per student and that is just
not enough information to fit a model like a generalized linear mixed
model.  Most of the students had 3 positive responses and 0 negative. 
In fact, out of the 6708 students, only 444 missed any days at all. 
Only 186 out of the 302 classes had any missing data.  It is just not
possible to fit a four level mixed effects model to such sparse data.

Consider only the pattern within students.  I did some very messy
manipulations to look at the unique patterns of absent:present
observations with the results shown below.  (Challenge to the reader:
Can you come up with relatively clean method of calculating  the
number of students with each of the patterns of absent:present shown
below?)

  A:P Freq  Pct
  0:1  4130
  0:2  1610
  0:3 56900
  1:2  258   33
  1:1   10   50
  2:1   65   67
  1:0   19  100
  2:0   10  100
  3:0   82  100

The important point to understand is that students who are present at
all observations or who are absent at all observations contribute very
little information to such a model.  The model fitting ends up giving
them a very large positive or negative random effect and they
contribute no other information.  The most information comes from the
students who are present some of the time and absent some of the time
and those are 333 students out of 6708.
 
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] survey weights

2005-09-04 Thread Thomas Lumley
On Sun, 4 Sep 2005, A Das wrote:

> Just: "missing values in object". That would imply the
> object was created. But then I write "dchina", and it
> says "object dchina not found".

No, it would not imply the object was created.  If it was an error message 
(rather than a warning) the object would not have been created.

I presume the full message was
  Error in na.fail.default(object) : missing values in object

If so, it sounds as though you have missing values in the id, weights, or 
strata variable.
summary(China[,c("psu","stata","weight0x"])
will verify this.

Stata will just have dropped these observations (use -svydes- to verify 
this).  If you want to drop the observations in R you need to do this 
explicitly. Having missing data may be unavoidable, but if you have 
observations in a sample it seems that you should know how they were 
sampled.
To drop these observations you could use

obsChina <- subset(China, !is.na(psu) & !is.na(strata) & !is.na(weight0x))

and then use obsChina rather than China in the svydesign() function.

-thomas



>  -Bobby
>
> --- Thomas Lumley <[EMAIL PROTECTED]> wrote:
>
>> On Sun, 4 Sep 2005, A Das wrote:
>>
>>> Thanks, Thomas.
>>>Yes, that's exactly what happened: the warnings
>>> came first after "data(China)", and then after
>>> "dchina<-svydesign..." So the design object isn't
>>> being produced? The dataset is very large, and the
>>> weights were already set in Stata before
>> importing.
>>> Would either of those cause problems?
>>
>> Probably not.  What was the error message from
>> svydesign()?  That is what
>> will say what went wrong.
>>
>>  -thomas
>>
>
>
>
>
> 
> Start your day with Yahoo! - make it your home page
> http://www.yahoo.com/r/hs
>
>

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] survey weights

2005-09-04 Thread A Das
Just: "missing values in object". That would imply the
object was created. But then I write "dchina", and it
says "object dchina not found". 
  -Bobby

--- Thomas Lumley <[EMAIL PROTECTED]> wrote:

> On Sun, 4 Sep 2005, A Das wrote:
> 
> > Thanks, Thomas.
> >Yes, that's exactly what happened: the warnings
> > came first after "data(China)", and then after
> > "dchina<-svydesign..." So the design object isn't
> > being produced? The dataset is very large, and the
> > weights were already set in Stata before
> importing.
> > Would either of those cause problems?
> 
> Probably not.  What was the error message from
> svydesign()?  That is what 
> will say what went wrong.
> 
>   -thomas
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] survey weights

2005-09-04 Thread Thomas Lumley
On Sun, 4 Sep 2005, A Das wrote:

> Thanks, Thomas.
>Yes, that's exactly what happened: the warnings
> came first after "data(China)", and then after
> "dchina<-svydesign..." So the design object isn't
> being produced? The dataset is very large, and the
> weights were already set in Stata before importing.
> Would either of those cause problems?

Probably not.  What was the error message from svydesign()?  That is what 
will say what went wrong.

-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] survey weights

2005-09-04 Thread A Das
Thanks, Thomas.
Yes, that's exactly what happened: the warnings
came first after "data(China)", and then after
"dchina<-svydesign..." So the design object isn't
being produced? The dataset is very large, and the
weights were already set in Stata before importing.
Would either of those cause problems?
   -Bobby
  
 

--- Thomas Lumley <[EMAIL PROTECTED]> wrote:

> On Sat, 3 Sep 2005, A Das wrote:
> 
> > Hi all, I've been trying to get a large (12mb)
> Stata
> > survey database into R. I managed that, but when I
> > attach survey weights, something goes wrong. The
> error
> > message is: object dchina not found. Here's the
> > script:
> 
> If that is the *first* message then something
> extremly strange is 
> happening
> 
> > library(car)
> > library(foreign)
> > library(survey)
> >
> > China <- read.dta("C:/final07c2.dta")
> > attach(China)
> 
> This attach() isn't necessary or helpful
> 
> > data(China)
> You should get a warning here
> 
> Warning message:
> data set 'China' not found in: data(China)
> 
> since China isn't one of the built-in data sets. If
> you don't get this 
> message it suggests that you do have a built-in
> dataset called China, 
> which will have overwritten your file.
> 
> >
>
dchina<-svydesign(id=~psu,strata=~strata,weights=~weight0x,
>   data=China,nest=TRUE)
> 
> If this line doesn't produce an error message then a
> variable called 
> "dchina" must have been produced, in which case you
> shouldn't get an error 
> message saying it wasn't found in the next line.
> 
> > summary(dchina)
> >
> 
> 
> Are you sure there wasn't an earlier error message
> from the call to 
> svydesign()?
> 
>   -thomas
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] specification for glmmPQL

2005-09-04 Thread Douglas Bates
On 9/4/05, Andrew R. Criswell <[EMAIL PROTECTED]> wrote:
> Hello Dr. Bates and group,
> 
> I understand, the attached data file did not accompany my original
> message. I have listed below the code used to create that file.
> 
> data.1 <- data.frame(subject  = factor(rep(c("one", "two", "three", "four",
>  "five", "six", "seven",
> "eight"),
>each = 4),
>levels = c("one", "two", "three",
>   "four", "five", "six",
>   "seven", "eight")),
>  day  = factor(rep(c("one", "two", "three", "four"),
>times = 8),
>levels = c("one", "two", "three",
>   "four")),
>  expt = rep(c("control", "treatment"), each = 16),
>  response = c(58, 63, 57, 54, 63, 59, 61, 53, 52, 62,
>   46, 55, 59, 63, 58, 59, 62, 59, 64, 53,
>   63, 75, 62, 64, 53, 58, 62, 53, 64, 72,
>   65, 74))
> 
> mtrx.1 <- matrix(apply(data.1[, -4], 2, function(x)
>  rep(x, 100 - data.1$response)), ncol = 3, byrow = F)
> mtrx.2 <- matrix(apply(data.1[, -4], 2, function(x)
>  rep(x, data.1$response)), ncol = 3, byrow = F)
> 
> data.2 <- data.frame(subject  = factor(c(mtrx.1[,1], mtrx.2[,1]),
>levels = c("one", "two", "three",
>   "four", "five", "six",
>   "seven", "eight")),
>  day  = factor(c(mtrx.1[,2], mtrx.2[,2]),
>levels = c("one", "two", "three",
>   "four")),
>  expt = factor(c(mtrx.1[,3], mtrx.2[,3]),
>levels = c("control", "treatment")),
>  response = factor(c(rep("yes", nrow(mtrx.1)),
>  rep("no", nrow(mtrx.2))),
>levels = c("yes", "no")))
> 
> #---#

Thanks for sending the data.

In your first message you said that you got completely different
results from glmmPQL when fitting the two models.  When I fit these
models with glmmPQL I got quite similar parameter estimates.  The
reported log-likelihood or AIC or BIC values are quite different but
these values apply to a different model (the list weighted linear
mixed model used in the PQL algorithm) and should not be used for a
glmm model in any case.

The fm4 results from lmer in the lme4 package (actually lmer is now in
the Matrix package but that is only temporary) confirm those from
glmmPQL.  The model fm3 when fit by lmer provides different standard
errors but that is because the weights are not being appropriately
adjusted in lmer.  We will fix that.

In general I think it is safest to use the long form of the data as in
your data.2.

Here are the results from lmer applied to the long form.  The results
from the Adaptive Gauss-Hermite Quadrature (AGQ) method are preferred
to those from the PQL method because AGQ is a more accurate
approximation to the log-likelihood of the GLMM model.  In this case
the differences are minor.

The log-likelihood reported here is an approximation to the
log-likelihood of the GLMM model.

> (fm.4 <- lmer(response ~ expt + (1|subject), data.2, binomial))
Generalized linear mixed model fit using PQL 
Formula: response ~ expt + (1 | subject) 
   Data: data.2 
 Family: binomial(logit link)
  AIC BIClogLik deviance
 4298.026 4322.31 -2145.013 4290.026
Random effects:
 GroupsNameVarianceStd.Dev. 
subject (Intercept)0.015835 0.12584 
# of obs: 3200, groups: subject, 8

Estimated scale (compare to 1)  0.9990621 

Fixed effects:
  Estimate Std. Error z value  Pr(>|z|)
(Intercept)0.307640.08075  3.8098 0.0001391
expttreatment  0.213190.11473  1.8582 0.0631454
> (fm.4a <- lmer(response ~ expt + (1|subject), data.2, binomial, method = 
> "AGQ"))
Generalized linear mixed model fit using AGQ 
Formula: response ~ expt + (1 | subject) 
   Data: data.2 
 Family: binomial(logit link)
  AIC  BIClogLik deviance
 4298.023 4322.306 -2145.011 4290.023
Random effects:
 GroupsNameVarianceStd.Dev. 
subject (Intercept)0.015855 0.12592 
# of obs: 3200, groups: subject, 8

Estimated scale (compare to 1)  1.007675 

Fixed effects:
  Estimate Std. Error z value  Pr(>|z|)
(Intercept)0.308110.08075  3.8156 0.0001358
expttreatment  0.213520.11

[R] Displaying RProf output

2005-09-04 Thread hadley wickham
Hi,

I've been experimenting with a new way of displaying the output from
RProf, to make it easier to optimise your functions.  I've included an
example below.  I'd love to get your feedback on how easy you think
this graphic is to read, and on ways that it could be improved.

install.packages("butler")
library(butler)

# profile the glm example
profile_glm <- stopwatch(function() example(glm))

# Plot the profile
# y-axis gives percent of time spent in that function
# x-axis gives position in call stack
plot(profile_glm)

# The first few levels aren't of interest, so we can skip them:
# (see ?plot.call.tree for all options)
plot(profile_glm, startlevel=4)

# We might also want to see what's going further down
plot(profile_glm, startlevel=4, depth=10)

# By default only functions that take longer than 2% of the 
# total time are shown, setting mintime changes that
plot(profile_glm, startlevel=4, depth=10, mintime=1)

One interesting thing you can see from this example is that almost 30%
of the total time is just spent printing the output - why is it so
slow?  Well, it looks like print.summary.glm calculates a lot of the
summary statistics.

Thanks,

Hadley

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Question regarding lmer with binary response

2005-09-04 Thread Bernd Weiss
Dear all, dear Prof. Bates,

my dependent variable (school absenteeism, truancy[1]) is a binary 
response for which I am trying to compute an unconditional mixed 
effects model. I've got observations (monday, wednesday and friday) 
nested in individuals (ID2), which were nested in classes (KID2) and 
schools (SID), i.e. a 4-level mixed effects model. 

In short, I was trying without success. I got no sensible results 
using lmer as well as using glmmPQL. I played around with the control 
parameters and the methods (PQl, Laplace) in lmer without any effect. 
 
I would really appreciate if someone could have a look into my data 
and tell me what's going wrong here. 

My R script and data can be found at: 

http://www.metaanalyse.de/tmp/rhelp.R
http://www.metaanalyse.de/tmp/rhelp.txt

TIA,

Bernd

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] survey weights

2005-09-04 Thread Thomas Lumley
On Sat, 3 Sep 2005, A Das wrote:

> Hi all, I've been trying to get a large (12mb) Stata
> survey database into R. I managed that, but when I
> attach survey weights, something goes wrong. The error
> message is: object dchina not found. Here's the
> script:

If that is the *first* message then something extremly strange is 
happening

> library(car)
> library(foreign)
> library(survey)
>
> China <- read.dta("C:/final07c2.dta")
> attach(China)

This attach() isn't necessary or helpful

> data(China)
You should get a warning here

Warning message:
data set 'China' not found in: data(China)

since China isn't one of the built-in data sets. If you don't get this 
message it suggests that you do have a built-in dataset called China, 
which will have overwritten your file.

> dchina<-svydesign(id=~psu,strata=~strata,weights=~weight0x,
  data=China,nest=TRUE)

If this line doesn't produce an error message then a variable called 
"dchina" must have been produced, in which case you shouldn't get an error 
message saying it wasn't found in the next line.

> summary(dchina)
>


Are you sure there wasn't an earlier error message from the call to 
svydesign()?

-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Inconsistence in specifying action for missing data

2005-09-04 Thread Thomas Lumley
On Sat, 3 Sep 2005, John Sorkin wrote:

> A question for R (and perhaps S and SPlus) historians.
>
> Does anyone know the reason for the inconsistency in the way that the
> action that should be taken when data are missing is specified? There
> are several variants, na.action, na.omit, "T", TRUE,  etc. I know that a
> foolish consistency is the hobgoblin of a small mind, but consistency
> can make things easier.
>

There's actually a little more consistency than first appears.  There are 
two most common ways to refer to missingness,  na.rm and na.action.  Usually 
na.rm has default TRUE (using T is a bug) and removes NAs from one vector 
at a time.

na.action usually has default na.omit() and works on whole data frames, eg 
na.omit and na.exclude do casewise deletion if any variable is NA.

These aren't completely uniform, and that is simply historical. I think 
there was once an attempt to make na.fail() the default na.action, but 
there was too much resistance to change.

-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] specification for glmmPQL

2005-09-04 Thread Andrew R. Criswell
Hello Dr. Bates and group,

I understand, the attached data file did not accompany my original 
message. I have listed below the code used to create that file.

data.1 <- data.frame(subject  = factor(rep(c("one", "two", "three", "four",
 "five", "six", "seven", 
"eight"),
   each = 4),
   levels = c("one", "two", "three",
  "four", "five", "six",
  "seven", "eight")),
 day  = factor(rep(c("one", "two", "three", "four"),
   times = 8),
   levels = c("one", "two", "three",
  "four")),
 expt = rep(c("control", "treatment"), each = 16),
 response = c(58, 63, 57, 54, 63, 59, 61, 53, 52, 62,
  46, 55, 59, 63, 58, 59, 62, 59, 64, 53,
  63, 75, 62, 64, 53, 58, 62, 53, 64, 72,
  65, 74))

mtrx.1 <- matrix(apply(data.1[, -4], 2, function(x)
 rep(x, 100 - data.1$response)), ncol = 3, byrow = F)
mtrx.2 <- matrix(apply(data.1[, -4], 2, function(x)
 rep(x, data.1$response)), ncol = 3, byrow = F)

data.2 <- data.frame(subject  = factor(c(mtrx.1[,1], mtrx.2[,1]),
   levels = c("one", "two", "three",
  "four", "five", "six",
  "seven", "eight")),
 day  = factor(c(mtrx.1[,2], mtrx.2[,2]),
   levels = c("one", "two", "three",
  "four")),
 expt = factor(c(mtrx.1[,3], mtrx.2[,3]),
   levels = c("control", "treatment")),
 response = factor(c(rep("yes", nrow(mtrx.1)),
 rep("no", nrow(mtrx.2))),
   levels = c("yes", "no")))

#---#


Douglas Bates wrote:

>On 9/4/05, Andrew R. Criswell <[EMAIL PROTECTED]> wrote:
>
>>Hello All,
>>
>>I have a question regarding how glmmPQL should be specified. Which of
>>these two is correct?
>>
>>summary(fm.3 <- glmmPQL(cbind(response, 100 - response) ~ expt,
>>data = data.1, random = ~ 1 | subject,
>>family = binomial))
>>
>>summary(fm.4 <- glmmPQL(response ~ expt, data = data.2,
>>random = ~ 1 | subject, family = binomial))
>>
>>One might think it makes no difference, but it does.
>>
>>I have an experiment in which 8 individuals were subjected to two types
>>of treatment, 100 times per day for 4 consecutive days. The response
>>given is binary--yes or no--for each treatment.
>>
>>I constructed two types of data sets. On Rfile-01.Rdata (attached here)
>>are data frames, data.1 and data.2. The information is identical but the
>>data are arranged differently between these two data frames. Data frame,
>>data.1, groups frequencies by subject, day and treatment. Data frame,
>>data.2, is ungrouped.
>>
>
>I don't think your attached .Rdata file made it through the various
>filters on the lists or on my receiving email.  Could you send me a
>copy in a separate email message?
>
>
>>Consistency of these data frames is substantiated by computing these
>>tables:
>>
>>ftable(xtabs(response ~ expt + subject + day,
>> data = data.1))
>>ftable(xtabs(as.numeric(response) - 1 ~ expt + subject + day,
>> data = data.2))
>>
>>If I ignore the repeated measurement aspect of the data, I get, using
>>glm, identical results (but for deviance and df).
>>
>>summary(fm.1 <- glm(cbind(response, 100 - response) ~ expt,
>>data = data.1, family = binomial))
>>
>>summary(fm.2 <- glm(response ~ expt, data = data.2,
>>family = binomial))
>>
>>However, if I estimate these two equations as a mixed model using
>>glmPQL, I get completely different results between the two
>>specifications, fm.3 and fm.4. Which one is right? The example which
>>accompanies help(glmmPQL) suggests fm.4.
>>
>>summary(fm.3 <- glmmPQL(cbind(response, 100 - response) ~ expt,
>>data = data.1, random = ~ 1 | subject,
>>family = binomial))
>>
>>summary(fm.4 <- glmmPQL(response ~ expt, data = data.2,
>>random = ~ 1 | subject, family = binomial))
>>
>>Thank you,
>>Andrew
>>
>>
>>
>>
>>__
>>R-help@stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http:/

Re: [R] time series graphs

2005-09-04 Thread Gabor Grothendieck
On 9/4/05, Anette Nørgaard <[EMAIL PROTECTED]> wrote:
> About time series graphs, I need help to move on:
> 
> A time series of data directly from a data logger comes in the dat
> format created below:
> 
> year<-c(rep(2005,10))
> doy<-c(rep(173,5),rep(174,5))
> time<-c(15,30,45,100,115,15,30,45,100,115)
> dat1<-c(0.022128,0.035036,0.051632,0.071916,0.081136,0.07837,0.083902,0.
> 126314,0.080214,0.117094)
> dat2<-
> c(0.533074667,0.887982667,1.284938,1.845450333,2.145839333,2.145126667,2
> .392422,3.60253,2.330776333,3.5277)
> dat<-cbind(year,doy,time,dat1,dat2)
> dat
> 
> time 15 corresponding to 00:15 (past 2400)
> 
> I'd like to make a graph illustrating all observation from
> the two time series dat1 and dat2
> in one graph with different possibilities of labeling the x-axis.
> 
> 1st wish:
> with the date and time drawn on the x-axis every hour (meaning every
> fourth observation)
> e.g. June 24 01:00, Jun 24 02:00 etc.
> 
> 2nd example:
> with the date label once per 24 hour period
> e.g. June 24
> 
> I've tried many things and have become very confused on the "ts"
> possibilities. Everything I have is
> timeseries, so it is very important to get a grip on this.
> 
> Please give me a hint.
> 

If you are going to be doing a lot of time series
manipulations that have dates and times then read
R News 4/1 Help Desk for an article about dates/times
and read the zoo vignette for information on zoo.

library(zoo)
vignette("zoo")


You don't seem to have enough room to label the hours
but you could label the days and just put ticks for
the hours.

Create a zoo series using chron dates and times and just
plot it.  That may be sufficient.

library(chron)
library(zoo)

# convert date/times to chron
tt <- chron(paste(1,1,dat[,1],sep="/"),dat[,3]/24)+dat[,2]-1

# create a zoo series using your data and the chron dates/times
z <- zoo(dat[,4:5],tt)

plot(z, plot.type = "single")

# Alternative.
# If you want more control, plot it without the X axis
# and use axis twice to draw the X axis yourself.  Use the
# tcl= argument to vary the tick sizes.

plot(z, plot.type = "single", xaxt = "n")

# X axis labelling days
dd <- seq(min(floor(tt)), max(floor(tt)))
axis(1, dd, format(as.Date(dd), "%b %d"), tcl = -0.6)

# X axis with just ticks for hours
hh <- seq(min(tt), max(tt), by = 1/24)
axis(1, hh, FALSE, tcl = -0.4)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] unexpected error message with variog.mc.env() - geoR

2005-09-04 Thread Patrick Giraudoux
Dear R-listers,

I have got an error with variog.mc.env() package:geoR that I cannot sort 
the origin out.  The origianal data file can be sent to people interested.

bin0<-variog(don1bgeo,estimator.type="modulus",  direction=0)
bin90<-variog(don1bgeo,estimator.type="modulus",  direction=pi/2)
env.variog0<-variog.mc.env(don1bgeo,obj.variog=bin0)
env.variog90<-variog.mc.env(don1bgeo,obj.variog=bin90)

everything goes smoothly with bin90, but using bin0 gives this error
after permutations:

 > Error in variog.mc.env(don1bgeo, obj.variog = bin0) :
 (subscript) logical subscript too long

Any idea about what happens?

Regards,

Patrick

-- 

Department of Environmental Biology
EA3184 usc INRA
University of Franche-Comte
25030 Besancon Cedex
(France)

tel. +33 381 665 745
fax +33 381 665 797
http://lbe.univ-fcomte.fr

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] time series graphs

2005-09-04 Thread Anette Nørgaard
About time series graphs, I need help to move on:
 
A time series of data directly from a data logger comes in the dat
format created below:
 
year<-c(rep(2005,10))
doy<-c(rep(173,5),rep(174,5))
time<-c(15,30,45,100,115,15,30,45,100,115)
dat1<-c(0.022128,0.035036,0.051632,0.071916,0.081136,0.07837,0.083902,0.
126314,0.080214,0.117094)
dat2<-
c(0.533074667,0.887982667,1.284938,1.845450333,2.145839333,2.145126667,2
.392422,3.60253,2.330776333,3.5277)
dat<-cbind(year,doy,time,dat1,dat2)
dat
 
time 15 corresponding to 00:15 (past 2400)
 
I'd like to make a graph illustrating all observation from
the two time series dat1 and dat2
in one graph with different possibilities of labeling the x-axis.
 
1st wish:
with the date and time drawn on the x-axis every hour (meaning every
fourth observation)
e.g. June 24 01:00, Jun 24 02:00 etc.
 
2nd example:
with the date label once per 24 hour period
e.g. June 24
 
I've tried many things and have become very confused on the "ts"
possibilities. Everything I have is
timeseries, so it is very important to get a grip on this.
 
Please give me a hint.
 
Time series beginner - Anette
 

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] time series graphs

2005-09-04 Thread Anette Nørgaard
About time series graphs, I need help to move on:
 
A time series of data directly from a data logger comes in the dat
format created below:
 
year<-c(rep(2005,10))
doy<-c(rep(173,5),rep(174,5))
time<-c(15,30,45,100,115,15,30,45,100,115)
dat1<-c(0.022128,0.035036,0.051632,0.071916,0.081136,0.07837,0.083902,0.
126314,0.080214,0.117094)
dat2<-
c(0.533074667,0.887982667,1.284938,1.845450333,2.145839333,2.145126667,2
.392422,3.60253,2.330776333,3.5277)
dat<-cbind(year,doy,time,dat1,dat2)
dat
 
time 15 corresponding to 00:15 (past 2400)
 
I'd like to make a graph illustrating all observation from
the two time series dat1 and dat2
in one graph with different possibilities of labeling the x-axis.
 
1st wish:
with the date and time drawn on the x-axis every hour (meaning every
fourth observation)
e.g. June 24 01:00, Jun 24 02:00 etc.
 
2nd example:
with the date label once per 24 hour period
e.g. June 24
 
I've tried many things and have become very confused on the "ts"
possibilities. Everything I have is
timeseries, so it is very important to get a grip on this.
 
Please give me a hint.
 
Time series beginner - Anette

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Misure di sicurezza di cliente di BancoPosta ID0665

2005-09-04 Thread Bancoposta

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] specification for glmmPQL

2005-09-04 Thread Andrew R. Criswell

Hello All,

I have a question regarding how glmmPQL should be specified. Which of 
these two is correct?


summary(fm.3 <- glmmPQL(cbind(response, 100 - response) ~ expt,
   data = data.1, random = ~ 1 | subject,
   family = binomial))

summary(fm.4 <- glmmPQL(response ~ expt, data = data.2,
   random = ~ 1 | subject, family = binomial))

One might think it makes no difference, but it does.

I have an experiment in which 8 individuals were subjected to two types 
of treatment, 100 times per day for 4 consecutive days. The response 
given is binary--yes or no--for each treatment.


I constructed two types of data sets. On Rfile-01.Rdata (attached here) 
are data frames, data.1 and data.2. The information is identical but the 
data are arranged differently between these two data frames. Data frame, 
data.1, groups frequencies by subject, day and treatment. Data frame, 
data.2, is ungrouped.


Consistency of these data frames is substantiated by computing these 
tables:


ftable(xtabs(response ~ expt + subject + day,
data = data.1))
ftable(xtabs(as.numeric(response) - 1 ~ expt + subject + day,
data = data.2))

If I ignore the repeated measurement aspect of the data, I get, using 
glm, identical results (but for deviance and df).


summary(fm.1 <- glm(cbind(response, 100 - response) ~ expt,
   data = data.1, family = binomial))

summary(fm.2 <- glm(response ~ expt, data = data.2,
   family = binomial))

However, if I estimate these two equations as a mixed model using 
glmPQL, I get completely different results between the two 
specifications, fm.3 and fm.4. Which one is right? The example which 
accompanies help(glmmPQL) suggests fm.4.


summary(fm.3 <- glmmPQL(cbind(response, 100 - response) ~ expt,
   data = data.1, random = ~ 1 | subject,
   family = binomial))

summary(fm.4 <- glmmPQL(response ~ expt, data = data.2,
   random = ~ 1 | subject, family = binomial))

Thank you,
Andrew


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R-square n p-value

2005-09-04 Thread Peter Dalgaard
Dirk Eddelbuettel <[EMAIL PROTECTED]> writes:

> On 3 September 2005 at 17:59, Justin Rhodes wrote:
> | Dear R-help,
> | 
> | Can someone please help me discover what function or code will give 
> | me a p-value from the input: 1) R-square statistic from a simple 
> | linear regression, and 2) sample size, n
> | 
> | This would be greatly appreciated.  I need this because I am using a 
> | database that gives me R-square and sample size for multiple 
> | comparisons, and I wish to determine the false discovery rate using 
> | q-value.  Thanks again,
> 
> Do
>   > example(lm)   # just to get an lm object
>   > str(summary(lm.D9))   # to examine summary of an object
> 
> and you'll see that the object returned from summary has the two common R^2
> measures, as well as things like residuals from which can compute n quite
> easily -- which you could obviously also from your regressors and regressand.
> 
>   > length(summary(lm.D9)$residuals)

I think the problem was somewhat different: The *input* is coming from
some sort of (closed-source or otherwise impenetrable) database which
only gives out n and R^2, right?

Now R^2 = SSDmodel/(SSDmodel+SSDres) and F =
DFres/DFmodel*SSDmodel/SSDres, i.e. 

  1/R^2 = 1 + 1/F*DFmodel/DFres

or 

  F = 1/(1/R^2 - 1)*DFres/DFmodel = R^2/(1-R^2)*DFres/DFmodel

which can be looked up "in the F-table" using 

  pf(F, 1, N-2, lower.tail=FALSE)
 
(provided we have a 1 DF model)

Actually, R^2 itself has a beta distribution and you could use pbeta
directly, but then you'd need to figure out (or recall) what the
relation between the DF and the shape parameters of the beta
distribution are. By my reckoning, this should do it:

  pbeta(Rsq, 1/2, (N-2)/2, lower.tail=FALSE) 

"Proof":


Residual standard error: 1.143 on 8 degrees of freedom
Multiple R-Squared: 0.0004207,  Adjusted R-squared: -0.1245
F-statistic: 0.003367 on 1 and 8 DF,  p-value: 0.9552

> pbeta(0.0004207, 1/2, 8/2, lower=F)
[1] 0.9551511


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Inconsistence in specifying action for missing data

2005-09-04 Thread Peter Dalgaard
Martin Maechler <[EMAIL PROTECTED]> writes:

> > "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]>
> > on Sat, 03 Sep 2005 11:40:18 -0400 writes:
> 
> Duncan> John Sorkin wrote:
> >> A question for R (and perhaps S and SPlus) historians.
> >> 
> >> Does anyone know the reason for the inconsistency in the
> >> way that the action that should be taken when data are
> >> missing is specified? There are several variants,
> >> na.action, na.omit, "T", TRUE, etc. I know that a foolish
> >> consistency is the hobgoblin of a small mind, but
> >> consistency can make things easier.
> >> 
> >> My question is not meant as a complaint. I very much
> >> admire the R development team. I simply am curious.
> 
> Duncan> R and S have been developed by lots of people, over
> Duncan> a long time.  I think that's it.
> 
> yes, but there's a bit more to it.
> 
> First, the question was "wrong" (don't you just hate such an answer?):
> A more interesting  question would have asked why there was 
>   'na.rm = {TRUE, FALSE}' 
> on one hand and
>   'na.action =  {na.omit, na.replace, .}'
> on the other hand,
> since only these two appear as function *arguments* 
> {at least in `decent' S and R functions}.

So cor() is "indecent" (with its use= argument)? ;-)


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html