[R] Running a likelihood ratio test for a logit model
Hi all -- I have to calculate a likelihood ratio test for a logit model. I found logLik, but I need to calculate the log likelihood for the model without any predictors. How can I specify this in glm? If the full model is glm(y ~ x1), is the one without predictors (y ~ 0)? Or (y ~ 1)? Is there a more direct way of getting this? -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Running a likelihood ratio test for a logit model
On 5/18/06, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Thu, 18 May 2006, Dimitris Rizopoulos wrote: the print method for glm object already shows you the null and residual deviance with the associated df; look also at the Value section of ?glm. Note though that deviances are not (log) likelihood ratios, but differences in deviances are twice log LRT (pace a previous answer). Ah, that's what I wanted. I was confused because the book I'm working out of (Agresti's _An Introduction to Categorical Data Analysis_) never refers to the components of the log LRT as deviances. anova() is a good way to get suitable tests out of model fits: in this case anova(glm(y ~ x1, family=binomial)) shows you the appropriate 2 log LRT I've been using that for comparing models, but got a little confused because it didn't spit out an associated p-value. I tried summary(anova) but that's just ridiculous. Does aov compute basically the same test, but using the F distribution rather than the chisq? -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Adding error bars to a trellis barchart display
Hi all -- I'm using trellis to generate bar charts, but there's no built-in function to generate error bars or confidence intervals, as far as I can tell. I assumed I could just write my own panel function to add them, so I searched the archive, and found a posting from the author of the package stating ... placing multiple bars side by side needs specialized calculations, which are done within panel.barchart. To add bars to these, you will need to reproduce those calculations. Just so I'm clear on this -- there's no capacity to add bars to the plot, nor to find out the coordinates of the bars in the graphs themselves. If you want them, you have to completely rewrite panel.barchart. Is this correct? Are there really so few people using error bars with bar charts? -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Specifying an appropriate error term in a hierarchical regression
On 17 Apr 2006 23:55:14 +0200, Peter Dalgaard [EMAIL PROTECTED] wrote: Confusing nesting and nesting... The / operator is designed to handle cases like this a b 1 1 1 2 2 1 2 2 in which the numbering of b only makes sense within a - no connection between b=1 when a=1 and when a=2 (think group1, member1). In this case, the interaction a:b makes sense, but a main effect of b does not. This makes sense. It sounds like there's no need for the / operator, assuming the factors are numbered appropriately. Is this correct? It is not clear to me what B(A) is supposed to mean in Kirk's terminology. Sounds like it could be the total sum of squares for the b stratum, which is the effect of b ignoring a (?). SSB(A), as Kirk defines it, is the pooled simple main effects of treatment B at each level of treatment A. Or [AB] - [A], if you prefer. I suppose my primary problem is specifying a different denominator for each level of the F test in the ANOVA. I can specify an overall error term (using Error in the model specification) but for the example in the book the F-test for factor A is MSA/MSB(A), while the F-test for factor B(A) is MSB(A)/MSerror. It doesn't seem like it's possible to do this using a single aov command; I have to run it once with each appropriate error term or do the F-test by hand. That can't be right, can it? -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Specifying an appropriate error term in a hierarchical regression
Hi all -- So I obtained a copy of _Statistical Models in S_, which I hoped would answer my question, but it doesn't quite. On page 151, it discusses nested terms, which is what I'm working with. So I tried using value ~ a/b which apparently should fit factor b within factor a. But the summary is the exact same summary that value ~ a + b and value ~ a * b produces, which doesn't seem right. And the F value reported for A is wrong; it should compare the mean square of A with the mean square of B(A), not the residual mean square as it does. What am I doing wrong? On 4/11/06, Chris Bergstresser [EMAIL PROTECTED] wrote: Hi all -- So I'm working through my statistics homework again, and trying to reproduce the examples in the book (Kirk's _Experimental Design_, third edition) in R. This is a completely randomized hierarchical design (CRH-28(A)). The B factor is completely nested within the A factor. Pages 480-482, for those playing along at home. I can use: summary(aov(value ~ a + Error(b), data = ex)); to get the correct F value for the main effect of A. I can use summary(aov(value ~ b, data = ex)); to get the correct values for B(A) and the within cell SS. But I can't find any documentation about constructing the Error term to get this output in a single analysis (except for http://www.psych.upenn.edu/~baron/rpsych/rpsych.html, but Kirk doesn't talk about these tests in terms of Error strata, so it's a little hard to figure out the correspondence). Also, the documentation on the Error term in ?aov is rather perfunctory. There's no mention of the / operator, for example. ex = scan() 3 6 3 3 1 2 2 2 5 6 5 6 2 3 4 3 7 8 7 6 4 5 4 3 7 8 9 8 10 10 9 11 a = factor(rep(paste(a, 1:2, sep = ), each = 16)); b = factor(rep(paste(b, 1:8, sep = ), each = 4)); ex = data.frame(value = ex, a, b); summary(aov(value ~ a + Error(b), data = ex)); summary(aov(value ~ b, data = ex)); __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] extremely simple for loop question
On 4/12/06, Paul Roebuck [EMAIL PROTECTED] wrote: I want to do a for loop in which m takes on the values 1, 5, 10, 15, 20. What is the syntax for doing that? I had been doing a loop for m in 1:20, but I only want those values above. ?seq ... which doesn't handle the 1 in that sequence very elegantly. You can do this for (m in c(1, seq(5, 20, 5))) for the general case, but for the specific circumstance I'd do for (m in c(1, 5, 10, 15, 20)) -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Specifying an appropriate error term in a hierarchical regression
Hi all -- So I'm working through my statistics homework again, and trying to reproduce the examples in the book (Kirk's _Experimental Design_, third edition) in R. This is a completely randomized hierarchical design (CRH-28(A)). The B factor is completely nested within the A factor. Pages 480-482, for those playing along at home. I can use: summary(aov(value ~ a + Error(b), data = ex)); to get the correct F value for the main effect of A. I can use summary(aov(value ~ b, data = ex)); to get the correct values for B(A) and the within cell SS. But I can't find any documentation about constructing the Error term to get this output in a single analysis (except for http://www.psych.upenn.edu/~baron/rpsych/rpsych.html, but Kirk doesn't talk about these tests in terms of Error strata, so it's a little hard to figure out the correspondence). Also, the documentation on the Error term in ?aov is rather perfunctory. There's no mention of the / operator, for example. ex = scan() 3 6 3 3 1 2 2 2 5 6 5 6 2 3 4 3 7 8 7 6 4 5 4 3 7 8 9 8 10 10 9 11 a = factor(rep(paste(a, 1:2, sep = ), each = 16)); b = factor(rep(paste(b, 1:8, sep = ), each = 4)); ex = data.frame(value = ex, a, b); summary(aov(value ~ a + Error(b), data = ex)); summary(aov(value ~ b, data = ex)); __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Grid graphics issues
Hi all -- So I'm trying to use lattice graphics, but I want to use a sunflower plot, which doesn't seem to be part of lattice. No problem, I put together the following code, which mostly works -- *except* for the first graph it generates. If it opens the graphic device, then it draws the xygrid, clears the device, then draws the sunflowerplot. All subsequent output operations work fine, as does rerunning the code with the graphic device open. What's going on? How can I fix this? I tried looking through the family of dev.* to no success. I can use trellis.device to open the device, but that causes the same problem. -- Chris library(grid); library(lattice); library(gridBase); trellis.par.set(theme = col.whitebg()); outer.plot.limits = c(0.75, 5.25); inner.plot.limits = c(0.92, 5.08); sunpanel - function(x, y, subscripts, ...) { pushViewport(viewport(x = 0.5, y = 0.5, just = center)); par(plt = gridPLT(), new = TRUE); sunflowerplot(x, y, axes = FALSE, xlab = , ylab = , xlim = inner.plot.limits, ylim = inner.plot.limits); popViewport(1); } for (i in 1:10) { x = round(runif(100, 1, 5)); y = round(runif(100, 1, 5)); print(xyplot(y ~ x, xlim = outer.plot.limits, ylim = outer.plot.limits, main = i, aspect = iso, between = list(x = 1, y = 0), panel = sunpanel)); } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Random effects ANOVA?
On 3/30/06, Prof Brian Ripley [EMAIL PROTECTED] wrote: I think you want print or summary rather than anova. anova() is not very useful for aov() models even without error strata. That's sort of better. summary(aov(time ~ drink + Error(video), data = df)); gives me: Error: video Df Sum Sq Mean Sq F value Pr(F) Residuals 2160 80 Error: Within Df Sum Sq Mean Sq F valuePr(F) drink 1 240.000 240.000 44.211 1.313e-08 *** Residuals 56 304.000 5.429 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ...what I'm really looking for is something akin to the output from SAS, which is: Source DF Anova SSMean Square F ValuePr F DRINK 1 240.000240.000 12.000.0742 VIDEO 2 160.000 80.000 4.000.2000 I didn't follow how the videos were chosen. Random effects apply when the 'treatments' were chosen from a large population (which might apply if each subject watched (on separate occasions) three videos chosen randomly from a larger pool), and if the interest is in the variability of the response over videos in the pool. If subjects were observed more than once then I suspect you most likely want a random effect for subjects. This problem comes directly from the final for my Experimental Stats class, which is why it feels a little odd. The videos were randomly selected from a library. Subjects watched one of the three videos, drank one of the two drinks, and completed the tasks. There were no repeated measures, so we can't block on subjects. The hypothesis test, according to SAS, treats DRINK*VIDEO as an error term. Setting aside whether this is the right analysis, how can I replicate this analysis in R? -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Random effects ANOVA?
On 30 Mar 2006 22:41:51 +0200, Peter Dalgaard [EMAIL PROTECTED] wrote: Not quite sure whether time ~ drink + video + Error(video:drink) works. It might, although it is a bit unnatural to have a random interaction between to systematic effects. This exactly reproduces the given SAS output. Whether it's actually the right model to use, given the problem, is a different question. And luckily not one I have to answer. -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Random effects ANOVA?
Hi all -- So I have a very simple dataset, which consists of 60 subjects, who watched one of three videos, drank one of two drinks, and completed a task. The response variable is the time to complete the task. The ANOVA command is simple enough: anova(aov(time ~ drink * video, data = df)); However, the videos were randomly selected; I need to use the random effects model for them. So I tried anova(aov(time ~ drink + Error(video), data = df)); This gives me a no applicable method for 'anova' error. The command aov works, but doesn't give me anything I can interpret effectively. Is there a simpler command I should be using? Am I doing something wrong? -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] RGui crashes on wle call
Prof Brian Ripley wrote: On Mon, 13 Jun 2005, Chris Bergstresser wrote: I'm seeing the following commands reliably produce a crash in RGui, version 2.0.1, for both my home and office machine: rm(list = ls(all = TRUE)); load(dataset.R); library(wle); data.wle = wle.lm(abortion ~ year * lib.con + age + gender + + urbanism + census + income + church.att + children + educ + + religion.imp, data = data.set); 1) Re-do the tests in the current version of R, preferably a beta of 2.1.1. Yeah -- I upgraded to R 2.1.0, and it still reliably crashes. 2) Read the rw-FAQ, do the debugging reported there (with Dr MinGW or gdb) and find where it is crashing. (This is very likely to be in wle.) If it is in wle, send a report to the maintainer. If it is in R, send a report to R-bugs. I'm a little loath to download and install a debugger, as I've never done it before. I don't even know what to look for if I were to install it. In either case, supply enough data to reproduce the problem. I can easily provide the datafile which seems to be causing it. It's only 200k, so if anyone is interested in pursuing the matter I'd be happy to send it to them. This is on Windows XP, btw. -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RGui crashes on wle call
Hi all -- I'm seeing the following commands reliably produce a crash in RGui, version 2.0.1, for both my home and office machine: rm(list = ls(all = TRUE)); load(dataset.R); library(wle); data.wle = wle.lm(abortion ~ year * lib.con + age + gender + + urbanism + census + income + church.att + children + educ + + religion.imp, data = data.set); dataset.R is moderately sized (about 200k compressed), and the command works just fine with lm rather than wle.lm. Since I'm not sure where the bug lies -- in wle, in R, or in RGui -- I'm not sure where I should report this, or if it's already been reported. What should I do? -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Factor Analysis functions...
Hi all -- I'm running a Factor Analysis on my dataset, and I've located the factanal() and princomp() methods. I don't want to do a PCA, so it looks like I should use factanal(), but factanal() requires specifying the number of factors you expect from the analysis. Are there any packages out there explicitly for Exploratory Factor Analysis that do not require specifying the number of expected factors? -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Clipboard size?
Hi all -- I have a matrix of doubles (roughly 30x80) which I'd like to copy to the clipboard. However, as the following shows: dm = matrix(runif(30 * 80), nrow = 80) write.table(dm, clipboard, sep = \t) Warning message: clipboard buffer is full and output lost Is there any way to increase the buffer? Obviously, other programs don't have the same limitations (i.e., I can copy the same volume of data from Excel or my text editor and paste it into R without a problem) -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Ack! Odd correlation matrix problem
Spencer Graves wrote: Does the following answer the question: cor(B, use=complete.obs) ** snip ** cor(B, use=pairwise.complete.obs) Yep. That's exactly the issue. I had thought the reference to casewise deletion in the help for complete.obs was referring solely to the two variables involved, not the entire dataset. The documentation might be a little clearer on this point for those just starting out in statistics, although I suppose it's only an issue if you're working with correlation matrices, which might imply you're really *not* just starting out in statistics, and should know better. -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Ack! Odd correlation matrix problem
Spencer Graves wrote: Does the following answer the question: cor(B, use=complete.obs) ** snip ** cor(B, use=pairwise.complete.obs) Yep. That's exactly the issue. I had thought the reference to casewise deletion in the help for complete.obs was referring solely to the two variables involved, not the entire dataset. The documentation might be a little clearer on this point for those just starting out in statistics, although I suppose it's only an issue if you're working with correlation matrices, which might imply you're really *not* just starting out in statistics, and should know better. -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Ack! Odd correlation matrix problem
Hi all -- Obviously, I'm missing something frightfully basic with the following. What's likely going wrong? cr = cor(cluster.data, use = complete.obs); cr[tax, spend] [1] -0.6138096 cor(cluster.data[[tax]], cluster.data[[spend]], + use = complete.obs) [1] -0.4925006 df = data.frame(tax = cluster.data$tax, + spend = cluster.data$spend); cr = cor(df, use = complete.obs); cr[tax, spend] [1] -0.4925006 cor(df[[tax]], df[[spend]], use = complete.obs) [1] -0.4925006 -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Normalization and missing values
Hi all -- I've got a large dataset which consists of a bunch of different scales, and I'm preparing to perform a cluster analysis. I need to normalize the data so I can calculate the difference matrix. First, I didn't see a function in R which does normalization -- did I miss it? What's the best way to do it? Second, what's the best way to deal with missing values? Obviously, I could just set them to 0 (the mean of the normalized scales), but I'm not sure that's the best way. -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Normalization and missing values
I'd just like to thank everyone who wrote in in response to my questions -- it's been greatly helpful, and appreciated. Jonathan Baron wrote: On 04/13/05 11:36, Chris Bergstresser wrote: First, I didn't see a function in R which does normalization -- did I miss it? What's the best way to do it? Look at scale(). Might be what you mean. Yeah; I should have remembered that. I did search the help files for normalization and normalize but that isn't in the help files. Somewhat oddly, I think, since it's exactly what scale is doing. But, in general, the right way to deal with missing data depends on the assumptions you make. As a novice, I found the following article to be helpful: Schafer, J. L., Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177. This article is great; thanks for providing it. The authors recommend either using ML Estimation or Multiple Imputation to fill in the missing data. They don't talk much about which is better for certain situations, however. I don't think my data are particularly sensitive to the method I use -- I've got about 1,100 cases, with 85 variables, and there are only about 1,000 missing values overall, spread pretty evenly across the data file. Are there any recommendations for specific packages? transcan() and aregImpute() look promising; based on the documentation (and what I can understand from it) I'm assuming they both provide Multiple Imputation? -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Getting the row/column of matrix for some values?
Hi all -- Quick (I hope) question: I've got a correlation matrix. Is there a quick way to find all the row/column names for those correlations higher than some value, like 0.4? -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Errors reading data file?
Hi all -- I tried loading a data file with the following command: data = read.table(filename.txt, header = TRUE, sep = ,) This appeared to work fine, except it silently skipped 400 records (out of 1200). It turns out, some of the text fields included quotes, and I needed to use 'quote = '. Why wasn't there an error message? Is there some way to enable one? -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] How to convert a factor to a numeric?
Hi all -- I've got two columns, both of which correspond to three factor levels (e.g., column1 is a, b, or c; column2 is x, y, or z). I'd like to generate a third column, consisting on whether the two factors are correctly aligned for a given case (in this example, a corresponds to x, b to y, and c to z). For example: a x TRUE a y FALSE b y TRUE c z TRUE b x FALSE Several questions: The easiest way seemed to me to be comparing the numeric values across columns, but the encodings are (a=1, b=2, c=3) and (x=1, y=3, z=2). Is there a way to change the underlying value representing each factor, so I could just run an equality on them? Is there a simple way to check for correspondence without recoding the factors? In the help for factor(), it says In particular, 'as.numeric' applied to a factor is meaningless, and may happen by implicit coercion. To revert a factor 'f' to its original numeric values, 'as.numeric(levels(f))[f]' is recommended and slightly more efficient than 'as.numeric(as.character(f))'. However, I get the following results. What's going on? f = gl(3, 1, 6, labels=c(a, b, c)) f [1] a b c a b c Levels: a b c as.numeric(levels(f))[f] [1] NA NA NA NA NA NA Warning message: NAs introduced by coercion as.numeric(as.character(f)) [1] NA NA NA NA NA NA Warning message: NAs introduced by coercion as.numeric(f) [1] 1 2 3 1 2 3 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Linear Trend Analysis?
Hi all -- I'm trying to use R for my Analysis of Categorical Data class, but I can't figure out how to do a weighted linear trend analysis. I have a table of categorical data, to which I've assigned weights to the rows and columns. I need to calculate r and M^2, which is apparently done in SAS using PROC FREQ and in STATA using correlate var1 var2 fw=count. What's the command for R? -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Linear Trend Analysis?
-Original Message- From: Tim F Liao [mailto:[EMAIL PROTECTED] In a course I recently taught, I used the following code to generate individual data from grouped data, which would give the same results as using fweight=count in Stata. Ind.Data-data.frame(cbind(rep(Y,freq),rep(X1,freq),rep(X2,freq))) where Y is the dependent variables and X1 and X2 are two explanatory variables and freq is your count variable. That had occurred to me, but some of the frequencies are around 17,000, so it seemed there had to be a more elegant way. -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html