[R] random effects in logistic regression (lmer)-- identification question
Hello R users! I've been experimenting with lmer to estimate a mixed model with a dichotomous dependent variable. The goal is to fit a hierarchical model in which we compare the effect of individual and city-level variables. I've run up against a conceptual problem that I expect one of you can clear up for me. The question is about random effects in the context of a model fit with a binomial family and logit link. Unlike an ordinary linear regression, there is no way to estimate an individual level random error in the linear predictor z_i = a + b*x_i + e_i because the variance of e_i is unidentified. The standard deviation of the logistic is pi*s/3, and we assume s=1, so the standard deviation is assumed to be pi/3 (just a bit bigger than 1, if you are comparing against the Standard Normal). The logistic fitting process sets the variance of the error and the parameters a and b are rescaled accordingly. As a result, there is an implicit individual-level random effect within a logistic model. There is a good explanation of this issue in Tony Lancaster's textbook An Introduction to Modern Bayesian Econometrics. So we usually end up thinking about a linear predictor in a logistic regression like so z_i = a + b*x_i Random effects can be estimated for groups or clusters of observations. If j is a grouping variable, then we estimate z_i = a + b*x_i + u_j The variance component here is, as far as I understand, measured on the same scale as the logistic distribution's standard deviation. Currently, I'm working on a project in which there are observations collected in many cities, represented by a variable PLACE. We are comparing the effect of several variables on a response for each of the values of a RACE variable. RACE is dichotomized into White and Nonwhite by the people who collect the data. For Nonwhites only, we can estimate the effect of individual level predictors (x) on the output (y). fm1 - lmer( y ~ x + (1 | PLACE), data=dat, , family=binomial, subset= Race %in% c(Nonwhite)) The random effect in this model indicates the variance caused by a Nonwhite's location on the response variable. Random effects: Groups NameVariance Std.Dev. PLACE (Intercept) 0.047326 0.21754 Suppose I estimate models for the 2 races in a combined model like this: fm1 - lmer( y ~ -1 + Race / (x) + (-1 + Race | PLACE), data=dat, family=binomial) This gives fixed effects estimates that are pretty easy for nonstatisticians to understand. One can look and see the effect of a variable x on people of different races. But the random effect is a bit hard to understand. Since the Race variable is dichotomous, My aim was to see if the variance of the random effect is different for the 2 racial categories. Here are the estimates: Random effects: Groups Name Variance Std.Dev. Corr PLACE RACE_ALLWhite 0.0095429 0.097688 RACE_ALLNonwhite 0.1286597 0.358692 1.000 I can't quite grasp what the correlation means. I BELIEVE the variance values indicate that the experiences of whites are homogeneous across cities, because the variance is negligible for them, while the experiences of Nonwhites are much more city-dependent. What does it mean when the correlation is 1.0? The correlation takes on that value when there are no city-level variables in this model, so I GUESS that it means that all city-level variation is attributed to the random effect. What do you think? If i put in some city level predictors, then the estimates of the variance components change--they essentially disappear to the minimum values--and the correlation is not 1.0 anymore. Random effects: Groups Name Variance Std.Dev. Corr PLACE RACEWhite5e-102.2361e-05 RACENonwhite 5e-102.2361e-05 0.001 This indicates that, after adding in the city level variables, the unaccounted for city-level variation is very small. Correct? Now, back to the idea that there is always an implicit individual level random effect in a logistic regression. Is it meaningful to ask is that individual level random effect different for people of different races? If so, How can I estimate that? If e_i is the implicit random error, can I ask for another random effect for Nonwhites only, say u_iN, in a model like so: z_i = a + b*x_i + e_i + u_iN Suppose the unique respondent number is ID and we create a new variable NonwhiteID = 0 for Nonwhites = ID for Nonwhites Here's my idea about how to check to see if the individual level variance component for Nonwhites is different from the baseline of Whites by fitting this: fm1 - lmer( y ~ -1 + Race / (x) + (-1 + Race | PLACE) + (1 | NonwhiteID), data=dat, family=binomial) Here, again, I leave out the city-level variables. Random effects: Groups Name Variance Std.Dev. Corr NonwhiteID (Intercept) 5.e-10 2.2361e-05 PLACE RACEWhite 1.0575e-02 1.0283e-01 RACENonwhite1.2880e-01
[R] Can I access the filename of the active editor window?
When I run a script from an open editor window (using Ctrl-A, Ctrl-R), I would like the filename of the script to be automatically written into the program output, to keep up with frequent version changes. Is there a way to access the filename (+ path) of the open script (the active one, if there is more than one editor window open)? I'm using R 2.4.1 on Windows XP. Paul [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Follow-up about ordinal logit with mixtures: how about 'continuation ratio' strategy?
On 5/10/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote: Paul Johnson wrote: This is a follow up to the message I posted 3 days ago about how to estimate mixed ordinal logit models. I hope you don't mind that I am just pasting in the code and comments from an R file for your feedback. Actual estimates are at the end of the post. . . . Paul, lrm does not give an incorrect sign on the intercepts. Just look at how it states the model in terms of Prob(Y=j) so that its coefficients are consistent with the way people state binary models. I'm not clear on your generation of simulated data. I specify the population logit, anti-logit that, and generate binary responses with those probabilities. I don't use rlogis. Thank you. I don't think I'm telling you anything you don't already know, but for the record, here goes. I think the difference in signs is just convention within fields. In choice models (the econometric tradition), we usually write that the response is in a higher category if eta + random cutpoint and that's how I created the data--rlogis supplies the random noise. Then eta - cutpoint random or cutpoint - eta random and so Prob ( higher outcome ) = Prob ( random cutpoint - eta) In the docs on polr from MASS, VR say they have the logit equal to cutpoint - eta so their parameterization is consistent with mine. On the other hand, your approach is to say the response is in a higher category if intercept + eta random, where I think your intercept is -cutpoint. So the signs in your results are reversed. -cutpoint + eta random But this is aside from the major question I am asking. Do we think that the augmented data frame approach described in Cole, Allison, and Ananth is a good alternative to maximum likelihood estimation of ordinal logit models, whether they are interpreted as proportional odds, continuation, or stopping models? In the cases I've tested, the parameter estimates from the augmented data frame are consistent with polr or lrm, but the standard errors and other diagnostic informations are quite different. I do not think I can follow your suggestion to use penalties in lrm because I have to allow for the possibilities that there are random effects across clusters of observations, possibly including random slope effects, but certainly including random intercepts for 2 levels of groupings (in the HLM sense of these things). Meanwhile, I'm studying how to use optim and numeric integration to see if the results are comparable. pj See if using the PO model with lrm with penalization on the factor does what you need. lrm is not set up to omit an intercept with the -1 notation. My book goes into details about the continuation ratio model. Frank Harrell -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Follow-up about ordinal logit with mixtures: how about 'continuation ratio' strategy?
This is a follow up to the message I posted 3 days ago about how to estimate mixed ordinal logit models. I hope you don't mind that I am just pasting in the code and comments from an R file for your feedback. Actual estimates are at the end of the post. ### Subject: mixed ordinal logit via augmented data setup. ### I've been interested in estimating an ordinal logit model with ### a random parameter. I asked in r-help about it. It appears to be ### a difficult problem because even well established commercial ### programs like SAS are prone to provide unreliable estimates. ### So far, I've found 3 avenues for research. 1) Go Bayesian and use ### MCMC to estimate the model. 2) Specify a likelihood function and ### then use R's optim function (as described in Laura A. Thompson, ### 2007, S-PLUS (and R) Manual to Accompany Agresti's Categorical ### Data Analysis (2002) 2nd edition). My guess is that either of ### those approaches would be worth the while, but I might have ### trouble persuading a target audience that they have good ### properties. 3) Adapt a continuation ratio approach. ### This latter approach was suggested by a post in r-help by Daniel ### Farewell farewelld_at_Cardiff.ac.uk ### http://tolstoy.newcastle.edu.au/R/help/06/08/32398.html#start ### It pointed me in the direction of continuation ratio logit models ### and one way to estimate an ordinal logit model with random ### parameters. ### Farewell's post gives working example code that shows a way to ### convert a K category ordinal variable into K-1 dichotomous ### indicators (a continuation ratio model). Those K-1 indicators ### can be stacked into one column and then a logistic regression ### program that is written for a two-valued output can be used. ### Farewell reasoned that one might then use a program for two-valued ### outputs including mixed effects. In his proposal, one would use ### the program lmer (package: lme4) ( a binomial family with a logit ### link) to estimate parameters for a dichotomous logit model with ### random parameters. ### This is the sort of magic trick I had suspected might be possible. ### Still, it is hard to believe it would work. But in the r-help ### response to the post by Farewell, there is no general objection ### against his modeling strategy. ### I had not studied continuation ratio logit models before, so I ### looked up a few articles on estimation of ordinal models by ### re-coding the output as a sequence of binary comparisons (stop ### ratios, continuation ratios, etc). The article that is most clear ### on how this can be done to estimate a proportional odds logistic ### model is ### Stephen R. Cole, Paul D. Allison, and Cande V. Ananth, ### Estimation of Cumulative Odds Ratios ### Ann Epidemiol 2004;14:172–178. ### They claim that one can recode an n-chotomy into n-1 dichotomous ### indicators. Each observation in the original dataset begets n-1 ### lines in the augmented version. After creating the dichotomous ### indicator, one uses an ordinary dichotomous logit model to ### estimate parameters and cutpoints for an ordinal logit ### model. Cole, et al., are very clear. ### There is an additional benefit to the augmented data approach. ### One can explicitly test the proportional odds assumption by checking ### for interactions between the included independent variables and the ### level of the dependent variable being considered. The Cole ### article shows some examples where the proportional assumption appears ### to be violated. ### To test it, I created the following example. This shows the ### results of maximum likelihood estimation with the programs polr ### (package:MASS) and lrm (package: Design). The estimates from ### the augmented data approach are not exactly the same as polr or ### lrm, but they are close. It appears to me the claims about the ### augmented data approach are mostly correct. The parameter ### estimates are pretty close to the true values, while the estimates ### of the ordinal cutpoints are a bit difficult to interpret. ### I don't know what to make of the model diagnostics for the augmented ### data model. Should I have confidence in the standard errors? How to interpret the degrees of freedom when 3 lines ### of data are manufactured from 1 observation? Are likelihood-ratio ### (anova) tests valid in this context? Are these estimates from the ### augmented data equivalent to maximum likelihood? What does it ### mean that the t-ratios are so different? That seems to be prima-facie ### evidence that the estimates based on the augmented data set are not ### trustworthy. ### Suppose I convince myself that the estimates of the ordinal model ### are as good as maximum likelihood. Is it reasonable to take the ### next step, and follow Farewell's idea of using this kind of model ### to estimate a mixture model? There are K-1 lines per case ### in the augmented data set. Suppose the observations were grouped ### into M sets
[R] ordered logistic regression with random effects. Howto?
I'd like to estimate an ordinal logistic regression with a random effect for a grouping variable. I do not find a pre-packaged algorithm for this. I've found methods glmmML (package: glmmML) and lmer (package: lme4) both work fine with dichotomous dependent variables. I'd like a model similar to polr (package: MASS) or lrm (package: Design) that allows random effects. I was thinking there might be a trick that might allow me to use a program written for a dichotomous dependent variable with a mixed effect to estimate such a model. The proportional odds logistic regression is often written as a sequence of dichotomous comparisons. But it seems to me that, if it would work, then somebody would have proposed it already. I've found some commentary about methods of fitting ordinal logistic regression with other procedures, however, and I would like to ask for your advice and experience with this. In this article, Ching-Fan Sheu, Fitting mixed-effects models for repeated ordinal outcomes with the NLMIXED procedure Behavior Research Methods, Instruments, Computers, 2002, 34(2): 151-157. the other gives an approach that works in SAS's NLMIXED procedure. In this approach, one explicitly writes down the probability that each level will be achieved (using the linear predictor and constants for each level). I THINK I could find a way to translate this approach into a model that can be fitted with either nlme or lmer. Has someone done it? What other strategies for fitting mixed ordinal models are there in R? Finally, a definitional question. Is a multi-category logistic regression (either ordered or not) a member of the glm family? I had thought the answer is no, mainly because glm and other R functions for glms never mention multi-category qualitative dependent variables and also because the distribution does not seem to fall into the exponential family. However, some textbooks do include the multicategory models in the GLM treatment. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Segmentation fault/buffer overflow with fix() in Fedora Core 5 from Extras repository
The Fedora Extras update of R found its way onto my systems today and I noted that fix() and edit() no longer work. There is a program crash that closes up R, but it does not leave a core file. I've tested by turning off SELinux, it had no effect. Do you see it too? What do you think? It happens on both systems I've tested. As far as I know, both of these systems are up-to-date. I restarted with R -d gdb to try to get a backtrace, but gdb says the debugging symbols have been removed and I don't see the debuginfo package on the Extras archive. I'm attaching the gdb info later, but I don't think it helps much without line numbers.. I think my next step will be to re-build R on these systems and see if the problem disappears. Right? If it still crashes, I'll make sure I have debugging symbols and give you a full backtrace. If it does not crash, I'll let you know as well Here's the session that crashes library(car) data(Chile) edit(Chile) *** buffer overflow detected ***: /usr/lib/R/bin/exec/R terminated === Backtrace: = /lib/libc.so.6(__chk_fail+0x29)[0xa8079d] /lib/libc.so.6[0xa8195d] /usr/lib/R/modules//R_X11.so[0x7c094a] /usr/lib/R/modules//R_X11.so[0x7c20dd] /usr/lib/R/modules//R_X11.so[0x7c3428] /usr/lib/R/modules//R_X11.so(RX11_dataentry+0xa25)[0x7c4b15] /usr/lib/R/lib/libR.so[0x2bf4c5] /usr/lib/R/lib/libR.so[0x1dfd26] /usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x1b0973] /usr/lib/R/lib/libR.so[0x1b4d28] /usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x1b0973] /usr/lib/R/lib/libR.so[0x1b1887] /usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x1b0973] /usr/lib/R/lib/libR.so(Rf_applyClosure+0x2a7)[0x1b2f67] /usr/lib/R/lib/libR.so[0x1e146f] /usr/lib/R/lib/libR.so(Rf_usemethod+0x609)[0x1e28d9] /usr/lib/R/lib/libR.so[0x1e30ae] /usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x1b0973] /usr/lib/R/lib/libR.so(Rf_applyClosure+0x2a7)[0x1b2f67] /usr/lib/R/lib/libR.so(Rf_eval+0x2f4)[0x1b07e4] /usr/lib/R/lib/libR.so(Rf_ReplIteration+0x311)[0x1d01b1] /usr/lib/R/lib/libR.so[0x1d03c1] /usr/lib/R/lib/libR.so(run_Rmainloop+0x60)[0x1d0710] /usr/lib/R/lib/libR.so(Rf_mainloop+0x1c)[0x1d073c] /usr/lib/R/bin/exec/R(main+0x46)[0x8048696] /lib/libc.so.6(__libc_start_main+0xc6)[0x9c41fe] /usr/lib/R/bin/exec/R[0x8048591] === Memory map: 0011-00329000 r-xp 08:05 553625 /usr/lib/R/lib/libR.so 00329000-00336000 rwxp 00219000 08:05 553625 /usr/lib/R/lib/libR.so 00336000-003cd000 rwxp 00336000 00:00 0 003cd000-003d5000 r-xp 08:05 683486 /lib/libnss_files-2.4.90.so 003d5000-003d6000 r-xp 7000 08:05 683486 /lib/libnss_files-2.4.90.so 003d7000-003f5000 r-xp 08:05 1045723 /usr/lib/R/library/grDevices/libs/grDevices.so 003f5000-003f6000 rwxp 0001d000 08:05 1045723 /usr/lib/R/library/grDevices/libs/grDevices.so 003f6000-003fc000 r-xp 08:05 1046746 /usr/lib/R/library/methods/libs/methods.so 003fc000-003fd000 rwxp 5000 08:05 1046746 /usr/lib/R/library/methods/libs/methods.so 003fd000-0040 r-xp 08:05 1050384 /usr/lib/R/library/tools/libs/tools.so 0040-00401000 rwxp 2000 08:05 1050384 /usr/lib/R/library/tools/libs/tools.so 00413000-0043d000 r-xp 08:05 553410 /usr/lib/R/lib/libRblas.so 0043d000-0043e000 rwxp 00029000 08:05 553410 /usr/lib/R/lib/libRblas.so 0043e000-004b9000 r-xp 08:05 2868184/usr/lib/libgfortran.so.1.0.0 004b9000-004ba000 rwxp 0007b000 08:05 2868184/usr/lib/libgfortran.so.1.0.0 004ba000-0050b000 r-xp 08:05 1049782 /usr/lib/R/library/stats/libs/stats.so 0050b000-0050d000 rwxp 0005 08:05 1049782 /usr/lib/R/library/stats/libs/stats.so 0051-00511000 r-xp 0051 00:00 0 [vdso] 00511000-0060a000 r-xp 08:05 2868912/usr/lib/libX11.so.6.2.0 0060a000-0060e000 rwxp 000f9000 08:05 2868912/usr/lib/libX11.so.6.2.0 00664000-0067b000 r-xp 08:05 683622 /lib/libpcre.so.0.0.1 0067b000-00692000 rwxp 00017000 08:05 683622 /lib/libpcre.so.0.0.1 007bb000-007d4000 r-xp 08:05 1050764/usr/lib/R/modules/R_X11.so 007d4000-007d5000 rwxp 00018000 08:05 1050764/usr/lib/R/modules/R_X11.so 007d5000-007e1000 rwxp 007d5000 00:00 0 00896000-008eb000 r-xp 08:05 2876525/usr/lib/libXt.so.6.0.0 008eb000-008ef000 rwxp 00054000 08:05 2876525/usr/lib/libXt.so.6.0.0 0099-009a7000 r-xp 08:05 683431 /lib/ld-2.4.90.so 009a7000-009a8000 r-xp 00017000 08:05 683431 /lib/ld-2.4.90.so 009a8000-009a9000 rwxp 00018000 08:05 683431 /lib/ld-2.4.90.so 009ab000-00acf000 r-xp 08:05 683432 /lib/libc-2.4.90.so 00acf000-00ad1000 r-xp 00124000 08:05 683432 /lib/libc-2.4.90.so 00ad1000-00ad2000 rwxp 00126000 08:05 683432 /lib/libc-2.4.90.so 00ad2000-00ad5000 rwxp 00ad2000 00:00 0 00ad7000-00afc000 r-xp 08:05 683433 /lib/libm-2.4.90.so 00afc000-00afd000 r-xp 00024000 08:05 683433 /lib/libm-2.4.90.so 00afd000-00afe000 rwxp 00025000 08:05 683433 /lib/libm-2.4.90.so 00b0-00b02000 r-xp 08:05 683435
Re: [R] Generate object names from variables
I collected some advice about this question a couple of years ago. This might help. http://pj.freefaculty.org/R/Rtips.html#2.1 Add variables to a data frame (or list) and the next one after that. On 7/14/06, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote: On Fri, 2006-07-14 at 16:57 +0200, Georg Otto wrote: Hi, I want to generate object names out of variables in a sort of variable substitution. first i generate some vectors and an empty list: vector.a-c(a,b) vector.b-c(c,d) vector.c-c(e,f) vectors-c(vector.a, vector.b, vector.c) vectors [1] vector.a vector.b vector.c vectorlist-list() What I would then like to do is to generate elements of the list by using variables, somehow like this (does not work): for (i in vectors) { + list$i-i + } To end up with a list like this: list $vector.a [1] a b $vector.b [1] c d $vector.c [1] e f Any hint will be appreciated. Cheers, Georg -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Can't there be a cd command?
It is a FAQ in our Linux lab. People start emacs and fire up R via ess, and then they have no idea 'where they are. For computer experts, it is not a problem, but for people who don't know much about computers, it is a pretty big problem. They have data in some subdirectory, but almost invariably they don't get emacs R started from that same place. Unfortunately, for our users, it does not help to simply re-label setwd as cd. Both commands imply a deeper understanding of the OS than they have. Also, unfortunately, these are the same people who don't understand that FAQs exist and should be consulted. These people are so new/timid that asking in r-help would be the last thing to cross their mind. I've wondered if it would not help to have the R prompt include the directory name, as in an x terminal. pj On 5/10/06, Prof Brian Ripley [EMAIL PROTECTED] wrote: Exactly. I don't think I have ever used setwd() on Linux. Also, I have never seen this asked before for a Unix-alike, so it seems not to be a FAQ. There is a common tendency for users who run into a problem to think everyone does too, and it isn't necessarily so. Frequently asked questions do make it to the FAQs: it is a defence mechanism for the volunteers supporting R. help.search(directory) gets you there. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Making R talk to Win/OpenBUGS in Linux (again)
that programs can use, and I've fiddled this lots of ways, but it fails, saying Error in paste(WINEPATH, -w, x) : object WINEPATH not found library(R2WinBUGS) I hope that by giving you this small not-yet-working example, you can spot where I'm going wrong. ##Paul Johnson 2006-04-29 library(R2WinBUGS) # Copied from Prof Andrew Gelman's example model.file - system.file(package = R2WinBUGS, model, schools.txt) # file.show(model.file) data(schools) schools J - nrow(schools) y - schools$estimate sigma.y - schools$sd data - list (J, y, sigma.y) inits - function(){ list(theta = rnorm(J, 0, 100), mu.theta = rnorm(1, 0, 100), sigma.theta = runif(1, 0, 100)) } parameters - c(theta, mu.theta, sigma.theta) schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, WINE=/usr/bin/wine,WINEPATH=/usr/bin/winepath) I do have the binary /usr/bin/winepath, but can't tell how to give R2WinBUGS what it wants. Many failed efforts below: schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, WINE=/usr/bin/wine,WINEPATH=/usr/bin/) Error in paste(WINEPATH, -w, x) : object WINEPATH not found schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, WINE=/usr/bin/wine) Error in if (!nchar(WINEPATH)) { : argument is of length zero schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, WINE=/usr/bin/wine,WINEPATH=/usr/bin) Error in paste(WINEPATH, -w, x) : object WINEPATH not found I know there are many unknowns here, but hope somebody has at least one working example that we can build on. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] using latex() in R for Unix
This thread piqued my interest in how to use Hmisc latex() to produce the tables that I actually want, rather than the ones that come out by default. Actually, I'd be glad to use R2HTML or any other tool if I can make the output suit my taste. Here's a small working example that does not require any special libraries. x - rnorm(100) y - rnorm(100) mylm - lm(y~x) Estimates - cbind(coef(mylm), sqrt(diag(vcov(mylm colnames(Estimates) - c(OLS estimate,std. error) rownames(Estimates) - c(Intercept,Education) latex(Estimates,file=/home/pauljohn/test4.tex,digits=4) That creates the top part of the table like I want it, except there are no stars on the standard errors. I want to add the diagnostic information, sample size, the R-square, and root mean square error: sumobj - summary(mylm) ## tedious pasting alert: sumStat - paste(N= ,sum(sumobj$df[-1]), , R^2 =, sumobj$r.squared, , RMSE = , sumobj$sigma) # The following does not have the desired effect because it creates and appends a whole new table latex(sumStat, file=/home/pauljohn/test4.tex, append=T) How can I snug up sumStat right at the end of the table? pj On 4/5/06, Frank E Harrell Jr [EMAIL PROTECTED] wrote: Peter Dalgaard wrote: Brian Quinif [EMAIL PROTECTED] writes: Yes, Peter, I do mean the later() function from the Hmsic package. Below is a reproducible example. Also, does anyone know how to stop R from automatically running LaTeX on the file produced by latex() The last bit is easy. It's the print method that does that, so just don't print. E.g. invisible(latex()) Or do w - latex(.., file='...') Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Making R talk to Win/OpenBUGS in Linux (again)
Dear Jun: How about telling us which version of Linux you use and how you make linbugs run? As far as I can tell, the OpenBUGS people have intentionally removed linbugs from their version 2.2. That leaves us with various scripts that people have posted tried, none work for me. Fedora Core 5, I cannot get any of the linbugs scripts that are floating around to work. I tried re-building the program thus: # gcc -o cbugs CBugs.c -ldl That creates an executable cbugs, but there is no joy for me. $ ./cbugs failed to install signal 32 failed to install signal 33 * BlackBox * illegal memory read [ad = ] - HostFiles.NewFileRef (pc=0D9B, fp=BFC73BE4) - HostFiles.OpenFile (pc=0E0E, fp=BFC73BFC) - HostFiles.Directory.Old (pc=29B2, fp=BFC73EA4) - StdLoader.ThisObjFile (pc=031D, fp=BFC744C0) - StdLoader.ReadHeader (pc=075F, fp=BFC746E0) - StdLoader.LoadMod (pc=107B, fp=BFC747F8) - StdLoader.LoadMod (pc=11E0, fp=BFC74910) - StdLoader.LoadMod (pc=11E0, fp=BFC74A28) - StdLoader.Hook.ThisMod (pc=1355, fp=BFC74A3C) - Kernel.ThisMod (pc=1224, fp=BFC74B58) - Init.Init (pc=004B, fp=BFC74B70) - Init.$$ (pc=000A, fp=BFC74B80) # export LD_ASSUME_KERNEL=2.4.1 # ./cbugs ./cbugs: error while loading shared libraries: libdl.so.2: cannot open shared object file: No such file or directory $ ldd ./cbugs linux-gate.so.1 = (0x00a2b000) libdl.so.2 = /lib/libdl.so.2 (0x00ba9000) libc.so.6 = /lib/libc.so.6 (0x00a4d000) /lib/ld-linux.so.2 (0x00a2c000) And after doing that LD_ASSUME_KERNEL thingie, nothing in the shell works anymore... $ which cbugs /usr/bin/which: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory On 5/1/06, jun yan [EMAIL PROTECTED] wrote: I have used linbugs with the rbugs package for a recent work. It might be worthwhile trying. Jun On 5/1/06, Uwe Ligges [EMAIL PROTECTED] wrote: Paul Johnson wrote: Thank you very much. With the insertion of WINEPATH declaration, then the following example program does run. And really fast, too! library(R2WinBUGS) WINEPATH - /usr/bin/winepath Will be fixed in the package real soon now. Gregor, many thanks for the patches. Best, Uwe Ligges # An example model file is given in: model.file - system.file(package = R2WinBUGS, model, schools.txt) # Let's take a look: file.show(model.file) # Some example data (see ?schools for details): data(schools) schools J - nrow(schools) y - schools$estimate sigma.y - schools$sd data - list (J, y, sigma.y) inits - function(){ list(theta = rnorm(J, 0, 100), mu.theta = rnorm(1, 0, 100), sigma.theta = runif(1, 0, 100)) } parameters - c(theta, mu.theta, sigma.theta) schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, WINE=/usr/bin/wine,WINEPATH=WINEPATH) On 4/30/06, Gregor Gorjanc [EMAIL PROTECTED] wrote: Hello Paul, thank you very much for this report. You caught a bug in R2WinBUGS that was introduced by me. I added support for winepath in 1.1-1 version. Since I switch between Windows and Linux I always set WINEPATH and then use it in bugs(). That's why I forgot to add it in further calls in bugs(). How silly :( This actually means that nobody else beside you tried to run R2WinBUGS under Linux. To bad. Please report summary to BUGS list as you have also asked there for help and I did not had time to answer you. I have been successfully running R2WinBUGS under Linux for quite some time now. Now you have the following options: A Set WINEPATH before you call bugs() i.e. WINEPATH - /usr/bin/winepath bugs(..., WINEPATH=WINEPATH) This is the fastest approach for you. B Apply the following diffs, build and install R2WinBUGS by yourself I am also sendind them to maintainer of the package. So this should be fixed soon also on CRAN. Uwe? --- R2WinBUGSOrig/R/bugs.R 2006-04-29 21:44:59.0 +0200 +++ R2WinBUGS/R/bugs.R 2006-04-30 12:52: 36.0 +0200 @@ -44,7 +44,7 @@ } else new.model.file - model.file bugs.script(parameters.to.save, n.chains, n.iter, n.burnin, n.thin, bugs.directory, new.model.file , debug=debug, is.inits=!is.null(inits), -bin = bin, DIC = DIC, useWINE = useWINE, newWINE = newWINE) +bin = bin, DIC = DIC, useWINE = useWINE, newWINE = newWINE, WINEPATH=WINEPATH) #if (useWINE missing(bugs.directory)) # bugs.directory - winedriveTr(bugs.directory) bugs.run(n.burnin, bugs.directory, WINE = WINE) --- R2WinBUGSOrig/R/bugs.script.R 2006-04-29 21:44: 59.0 +0200 +++ R2WinBUGS/R
Re: [R] Making R talk to Win/OpenBUGS in Linux (again)
of R2WinBUGS, and R2WinBUGS is now buildable on Linux, it looks like R2WinBUGS may be the way to go. But it fails. The error centers around the WINEPATH setting. I've learned that wine has a function winepath that returns information that programs can use, and I've fiddled this lots of ways, but it fails, saying Error in paste(WINEPATH, -w, x) : object WINEPATH not found library(R2WinBUGS) I hope that by giving you this small not-yet-working example, you can spot where I'm going wrong. ##Paul Johnson 2006-04-29 library(R2WinBUGS) # Copied from Prof Andrew Gelman's example model.file - system.file(package = R2WinBUGS, model, schools.txt) # file.show(model.file) data(schools) schools J - nrow(schools) y - schools$estimate sigma.y - schools$sd data - list (J, y, sigma.y) inits - function(){ list(theta = rnorm(J, 0, 100), mu.theta = rnorm(1, 0, 100), sigma.theta = runif(1, 0, 100)) } parameters - c(theta, mu.theta, sigma.theta) schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, WINE=/usr/bin/wine,WINEPATH=/usr/bin/winepath) I do have the binary /usr/bin/winepath, but can't tell how to give R2WinBUGS what it wants. Many failed efforts below: schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, WINE=/usr/bin/wine,WINEPATH=/usr/bin/) Error in paste(WINEPATH, -w, x) : object WINEPATH not found schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, WINE=/usr/bin/wine) Error in if (!nchar(WINEPATH)) { : argument is of length zero schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, WINE=/usr/bin/wine,WINEPATH=/usr/bin) Error in paste(WINEPATH, -w, x) : object WINEPATH not found I know there are many unknowns here, but hope somebody has at least one working example that we can build on. -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. -- -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Making R talk to Win/OpenBUGS in Linux (again)
I'm back! I've just learned that, on a fully updated Fedora Core Linux5 sytem, the working solution to access Winbugs under wine via the R package rbugs no longer works. Here was my last post on this topic (with the formerly working solution) from January. http://finzi.psych.upenn.edu/R/Rhelp02a/archive/68497.html Currently, what happens is that WinBUGS starts up, but just sits there doing nothing. I do not know if the problem is due to a change in wine or rbugs, and since both of them are updated, it is hard to say. I'm thinking it is a wine or perhaps even a kernel related problem. Inside WinBUGS running under wine, it does not even work to manually open the bugs.script file and then choose Model/Script. it just returns a screen full of errors saying the commands fail. I don't think rbugs gets even that far, however, since the WinBUGS window does pop up, but nothing happens. rbugs has been retooled to emphasize use of linbugs, the OpenBUGS-based thing for linux, but I can't get linbugs to run at all on my system, and the linbugs program itself appears to have been removed from OpenBUGS distribution altogether. (I'm following that with separate email to the OpenBUGS group). I think JAGS is the right long term solution, but currently I'm in the middle of a semester trying to teach about Bayesian stats and the students have some familiarity with WinBUGS and I'm interested in making it work. So while I'm learning more about JAGS and the bayesmix package that supports access to it from R, I still would like a way to interact with Winbugs through Wine. If anybody has rbugs working in current Linux, please tell me how--give me a working example? In the meanwhile, I've noticed that nightly updates have successfully installed R2WinBUGS on linux systems and I've got a small test case that I'd like to ask about. Since rbugs is a linux adaptation of R2WinBUGS, and R2WinBUGS is now buildable on Linux, it looks like R2WinBUGS may be the way to go. But it fails. The error centers around the WINEPATH setting. I've learned that wine has a function winepath that returns information that programs can use, and I've fiddled this lots of ways, but it fails, saying Error in paste(WINEPATH, -w, x) : object WINEPATH not found library(R2WinBUGS) I hope that by giving you this small not-yet-working example, you can spot where I'm going wrong. ##Paul Johnson 2006-04-29 library(R2WinBUGS) # Copied from Prof Andrew Gelman's example model.file - system.file(package = R2WinBUGS, model, schools.txt) # file.show(model.file) data(schools) schools J - nrow(schools) y - schools$estimate sigma.y - schools$sd data - list (J, y, sigma.y) inits - function(){ list(theta = rnorm(J, 0, 100), mu.theta = rnorm(1, 0, 100), sigma.theta = runif(1, 0, 100)) } parameters - c(theta, mu.theta, sigma.theta) schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, WINE=/usr/bin/wine,WINEPATH=/usr/bin/winepath) I do have the binary /usr/bin/winepath, but can't tell how to give R2WinBUGS what it wants. Many failed efforts below: schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, WINE=/usr/bin/wine,WINEPATH=/usr/bin/) Error in paste(WINEPATH, -w, x) : object WINEPATH not found schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, WINE=/usr/bin/wine) Error in if (!nchar(WINEPATH)) { : argument is of length zero schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, WINE=/usr/bin/wine,WINEPATH=/usr/bin) Error in paste(WINEPATH, -w, x) : object WINEPATH not found I know there are many unknowns here, but hope somebody has at least one working example that we can build on. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] need automake/autoconf help to build RnetCDF and ncdf packages
I imagine this where are your header files problem comes up in other packages, so I'm asking this as a general R question. How should configure scripts be re-written so they look in more places? Briefly, the problem is that Fedora-Extras installs the header files in a subdirectory /usr/include/netcdf-3 rather than /usr/include: # rpm -ql netcdf-devel /usr/include/netcdf-3 /usr/include/netcdf-3/ncvalues.h /usr/include/netcdf-3/netcdf.h /usr/lib/netcdf-3/libnetcdf.a /usr/lib/netcdf-3/libnetcdf_c++.a /usr/lib/netcdf-3/libnetcdf_g77.a Last week I posted in this list that I re-built the Fedora-Extras netcdf rpm so that it would have more standard installation, and then I was able to make RNetCDF work. In the meanwhile, I posted in bugzilla.redhat.com asking if they might use the standard packaging, but their response is an adamant refusal: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=189734 When netcdf updates are issued in the Fedora-Extras network, the special hacks I put in to un-do their special hacks are lost, and netcdf programs don't work anymore. The attempt to build ncdf fails inside R or on the command line, but it gives a GOOD HINT about a command line work around: # R CMD INSTALL ncdf_1.5.tar.gz [...] checking /sw/include/netcdf.h presence... no checking for /sw/include/netcdf.h... no Fatal error: I cannot find the directory that holds the netcdf include file netcdf.h! You can specify it as follows: ./configure --with-netcdf_incdir=directory_with_file_netcdf.h *** Special note for R CMD INSTALL users: * The syntax for specifying multiple --configure-args does not seem to be well documented in R. If you have installed the netcdf include and library directories in some non-standard location, you can specify BOTH these during the R CMD INSTALL process using the following syntax: R CMD INSTALL --configure-args=-with-netcdf_incdir=/path/to/netcdf/incdir -with-netcdf_libdir=/path/to/netcdf/libdir ncdf_1.1.tar.gz where you should, of course, specify your own netcdf include and library directories, and the actual package name. *** I found that the following did work! # R CMD INSTALL --configure-args=-with-netcdf_incdir=/usr/include/netcdf-3 -with-netcdf_libdir=/usr/lib/netcdf-3 ncdf_1.5.tar.gz It is not the best solution, because special administrative effort is required. And the install.packages approach inside R won't work. However, with RNetCDF, the problem is slightly worse, and no such helpful message appears: # R CMD INSTALL RNetCDF_1.1-3.tar.gz * Installing *source* package 'RNetCDF' ... checking for gcc... gcc checking for C compiler default output... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for executable suffix... checking for object suffix... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for main in -lnetcdf... no configure: error: netcdf library not found ERROR: configuration failed for package 'RNetCDF' ** Removing '/usr/lib/R/library/RNetCDF' I have no reason to doubt that the Fedora-Extras authors are right, and that some changes in the configure scripts for these packages are required. In RnetCDF's configure.ac file, I see the place where it specifies the NETCDF_INCDIR if test -z ${NETCDF_PATH}; then AC_CHECK_FILE(/usr/local/include/netcdf.h, [USR_LOCAL_NETCDF_H=TRUE], [USR_LOCAL_NETCDF_H=FALSE]) if test ${USR_LOCAL_NETCDF_H} = TRUE; then NETCDF_INCDIR=/usr/local/include NETCDF_LIBDIR=/usr/local/lib NETCDF_LIBNAME=netcdf HAVE_NETCDF_H=TRUE elif test ${HAVE_NETCDF_H} = FALSE; then AC_CHECK_FILE(/usr/include/netcdf.h, [USR_NETCDF_H=TRUE], [USR_NETCDF_H=FALSE]) if test ${USR_NETCDF_H} = TRUE; then NETCDF_INCDIR=/usr/include NETCDF_LIBDIR=/usr/lib NETCDF_LIBNAME=netcdf HAVE_NETCDF_H=TRUE fi fi else NETCDF_INCDIR=${NETCDF_PATH}/include NETCDF_LIBDIR=${NETCDF_PATH}/lib NETCDF_LIBNAME=netcdf AC_CHECK_FILE(${NETCDF_INCDIR}/netcdf.h, [INCDIR_NETCDF_H=TRUE], [INCDIR_NETCDF_H=FALSE]) if test ${INCDIR_NETCDF_H} = TRUE; then HAVE_NETCDF_H=TRUE fi fi I've tried fiddling around in this, and then typing #autoconf configure.ac newconfigure sh ./newconfigure But it always ends the same: checking for main in -lnetcdf... no : error: netcdf library not found So, is there somebody here who know how configure scripts ought to be written to accomodate this? -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
[R] using betareg: problems with anova and predict
Dear R-helpers: We have had fun using betareg to fit models with proportions as dependent variables. However, in the analysis of these models we found some wrinkles and don't know where is the best place to start looking for a fix. The problems we see (so far) are that 1. predict ignores newdata 2. anova does not work Here is the small working example: x - c(1, 3, 1, 4, 5, 3, 5, 2, 5, 2) y - c(.3, .4, .4, .5, , .7, .4, .3, .4, .3, .5) x2 - c( 4, 2, 1, 5, 1, 2, 4, 2, 1, 3) library(betareg) mybreg - betareg(y~x) summary(mybreg) predict(mybreg, newdata=data.frame(x=c(2,2))) mybreg2 - betareg(y~x+x2) summary(mybreg) anova(mybreg, mybreg2) --- Example output: predict(mybreg, newdata=data.frame(x=c(2,2))) [1] 0.3903155 0.4207632 0.3903155 0.4362319 0.4518258 0.4207632 0.4518258 [8] 0.4054484 0.4518258 0.4054484 ... anova(mybreg, mybreg2) Error in as.double.default(lapply(objects, df.residual)) : (list) object cannot be coerced to 'double' I have been digging in this a little bit and notice betareg does not return the log likelihood at the maximum likelihood estimates, but I am able to hack the file /usr/lib/R/library/betareg/R/betareg and save the value of the optim function and print that out, so at least I can do a likelihood-ratio test by hand in place of the anova. But I'm stumped on why the predict ignores newdata. I'm using R-2.1.1 on Fedora Core Linux 5 with the betareg package from CRAN. In case you are interested in betareg, we have found this article to be extremely readable and helpful: FERRARI, S.L.P., CRIBARI-NETO, F. (2004). Beta regression for modeling rates and proportions. Journal of Applied Statistics, v. 31, n. 7, p. 799-815. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Solution: Making RNetCDF work on Fedora Linux
Dear R users who might like to use the package RNetCDF on Fedora Linux: Fedora (versions 4 and 5) users might have noticed that the default install of the netcdf and netcdf-devel packages from the Fedora Extra archive is inconsistent with the R package RNetCDF. The attempt to install RNetCDF results in a failure in the configure stage because the header library info for netcdf cannot be found. In here, http://pj.freefaculty.org/software/PolsFC5Updates/ I just deposited an updated set of RPMS for netcdf version 3.6.1. You need the netcdf and netcdf-devel packages in order to install the RNetCDF package inside R. You don't need debuginfo, unless you are trying to use the GNU debugger to find bugs in netcdf itself. The problem installing RNetCDF was caused by the Fedora-Extras package maintainer's use of the install directory /usr/include/netcdf-3 rather than /usr/include. I had several very productive emails with RNetCDF's packager Pavel Michna michna at giub.unibe.ch and after a while, I concluded that the Fedora-Extras packaging is just wrong, and there's no point in trying to hack RNetCDF to work around it. So I removed the unusual packaging. While I was in the repackaging mood, I built the RPMS for the netcdf version 3.6.1, one notch newer than Fedora-extra currently offers. If you install these instead of the netcdf that Fedora extras provides, then it will work fine to do the RNetCDF install. Ordinarily, you would face the trouble that the automatic updates for Fedora (via yum) might obliterate the netcdf that I give you. You could exclude netcdf in your global yum settings, but in this case you might not have to worry. The package version I give is 3.6.1, whereas the Extras has only 3.6.0, and I gave the package release number 10. So until the extras folk get up to 3.6.1 and release 10, I think your yum updates will leave this one alone. Of course, if they package up 3.6.2, which is supposed to be released soon, then you will be in trouble. I've posted the SPEC file and SRPM file in case you want to build your own RPMS. Its fun! -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Problems in package management after Linux system upgrade
I upgraded from Fedora Core 4 to Fedora Core 5 and I find a lot of previously installed packages won't run because shared libraries or other system things have changed out from under the installed R libraries. I do not know for sure if the R version now from Fedora-Extras (2.2.1) is exactly the same one I was using in FC4. I see problems in many packages. Example, Hmisc: unable to load shared library '/usr/lib/R/library/Hmisc/libs/Hmisc.so': libgfortran.so.0: cannot open shared object file: No such file or directory Error in library(Hmisc) : .First.lib failed for 'Hmisc' If I manually do a re-install, then Hmisc and the other packages are fine. I ?THINK? that if I had been using version 2.2.0, and now I have 2.2.1, then the checkBuilt option would force a rebuild on all packages. Right? I'm pasting in below the script that I run nightly to update all packages and install any new ones in CRAN. Can anybody suggest changes that might cause a rebuild of packages that need rebuilding? ## PJ 2005-11-05 options(repos = http://cran.cnr.berkeley.edu/;) #failPackages is the black list. Things get inserted for various reasons #rejected because they don't build on my system as of July, 2005, or are obviously not needed failPackages1 - c(BRugs,tclkt2,cyclones,rpvm,ncdf,gtkDevice,gap,gnomeGUI,mimR,pathmix,rcdd,rgdal,rpvm,Rmpi,RQuantLib,RMySQL, RNetCDF,RODBC,ROracle,rsprng,RWinEdt,taskPR) #rejected because I subjectively think we don't need them failPackages2 - c(aaMI,AlgDesign,bim,caMassClass,CGIwithR,CDNmoney,clac,clim.pact,compositions,cyclones,hapassoc,haplo.score,haplo.stats,hapsim,httpRequest, labdsv,kza,LMGene,Malmig,magic,negenes,oz,papply,spe,wavethresh,waveslim,tdthap) failPackages3 - c(rcom,Rlsf) #put the 3 sets of rejects together failPackages - union(failPackages1,union(failPackages2,failPackages3)) #list of all currently installed packages installedPackages - rownames (installed.packages() ) #do any installed packages need removal because they are on the blacklist? needRemoval - installedPackages %in% failPackages # remove any blacklisted packages if they are already installed. if (sum(needRemoval) 0) remove.packages(installedPackages[needRemoval] ) #update the ones you want to keep update.packages(ask=F, checkBuilt=T) #get list of all new packages on CRAN theNew - new.packages() #do any of the new packages belong to the black list? shouldFail - theNew %in% failPackages #install non blacklisted packages that are in theNew list if (sum(!shouldFail) 0) install.packages( theNew[!shouldFail],dependencies=T) # VGAM is not in CRAN yet, but Zelig will use it. if ( VGAM %in% installedPackages) update.packages(repos=http://www.stat.auckland.ac.nz/~yee,ask=F) else install.packages(VGAM, repos=http://www.stat.auckland.ac.nz/~yee;) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Want to fit random intercept in logistic regression (testing lmer and glmmML)
Greetings. Here is sample code, with some comments. It shows how I can simulate data and estimate glm with binomial family when there is no individual level random error, but when I add random error into the linear predictor, I have a difficult time getting reasonable estimates of the model parameters or the variance component. There are no clusters here, just individual level responses, so perhaps I am misunderstanding the translation from my simple mixed model to the syntax of glmmML and lmer. I get roughly the same (inaccurate) fixed effect parameter estimates from glmmML and lmer, but the information they give on the variance components is quite different. Thanks in advance. Now I paste in the example code ### Paul Johnson [EMAIL PROTECTED] ### 2006-03-08 N - 1000 A - -1 B - 0.3 x - 1 + 10 * rnorm(N) eta - A + B * x pi - exp(eta)/(1+exp(eta)) myunif - runif(N) y - ifelse(myunif pi, 1, 0) plot(x,y, main=bquote( eta[i] == .(A) + .(B) * x[i] )) text ( 0.5*max(x), 0.5, expression( Prob( y[i] == 1) == frac( 1 , 1 + exp(-eta[i] myglm1 - glm ( y ~ x, family=binomial(link=logit) ) summary(myglm1) ## Just for fun myglm2 - glm(y~x, family=quasibinomial) summary(myglm2) ### Mixed model: random intercept with large variance eta - A + B * x + 5 * rnorm(N) pi - exp(eta)/(1+exp(eta)) myunif - runif(N) y - ifelse(myunif pi, 1, 0) plot(x,y, main=bquote( eta[i] == .(A) + .(B) * x[i] + u[i])) text ( 0.5*max(x), 0.5, expression( Prob( y[i] == 1) == frac( 1 , 1 + exp(-eta[i] ### Parameter estimates go to hell, as expected myglm3 - glm ( y ~ x, family=binomial(link=logit) ) summary(myglm3) ### Why doesn't quasibinomial show more evidence of the random intercept? myglm4 - glm(y~x, family=quasibinomial) summary(myglm4) # Can I estimate with lmer? library(lme4) ### With no repeated observations, what does lmer want? ### Clusters of size 1 ? ### In lme, I'd need random= ~ 1 idnumber - 1: length(y) mylmer1 - lmer( y ~ x + ( 1 | idnumber ), family=binomial, method=Laplace ) summary(mylmer1) ### Glmm wants clusters, and I don't have any clusters with more than 1 observation ### library(glmmML) myglmm1 - glmmML(y~x, family=binomial, cluster = idnumber ) summary(myglmm1) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] how to make plotmath expression work together with paste
Recent questions about using plotmath have renewed my interest in this question I want to have expressions take values of variables from the environment. I am able to use expressions, and I am able to use paste to put text and values of variables into plots. But the two things just won't work together. Here is some example code that shows what I mean. plot(NA,xlim=c(0,100),ylim=c(0,100)) #show plot math works text(16, 22, expression(slope == frac(partialdiff * f(hat(theta)), partialdiff * hat(theta # I want to put values of variables into the middle of expressions avar1 - 10 # prove I am able to use paste! text(40,40, paste(a word,avar1,other words) # But I'm completely frustrated by the problem of making paste and # expression work together. amath1 - expression(slope == frac(partialdiff * f(hat(theta)), partialdiff * hat(theta))) # still works text(10,60,amath1) # paste breaks math output text(60,60, paste (amath1, = , avar1) ) # Can't paste expression text(20,30, paste(a word,avar1, expression( partialdiff ))) # paste does not work anymore--value of avar1 not replaced text(50,80, expression(paste(slope == frac(partialdiff * f(hat(theta)),partialdiff * hat(theta)), avar1))) n-12 # This does get the avar1 value in the string and put it in, but it # piles the labels on top of each other. Wish they went side by side text(12, 15, labels=c(expression(bar(x) == sum(frac(x[i], n), i==1, )), paste(gamma =,avar1)),cex = .8) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] glm gives t test sometimes, z test others. Why?
I just ran example(glm) and happened to notice that models based on the Gamma distribution gives a t test, while the Poisson models give a z test. Why? Both are b/s.e., aren't they? I can't find documentation supporting the claim that the distribution is more like t in one case than another, except in the Gaussian case (where it really is t). Aren't all of the others approximations based on the Wald idea that bhat^2 Var(bhat) is asymptotically Chi-square? And that sqrt(Chi-square) is Normal. While I'm asking, I wonder if glm should report them at all. I've followed up on Prof Ripley's advice to read the Hauck Donner article and the successors, and I'm persuaded that we ought to just use the likelihood ratio test to decide about individual parameters. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] predict.glm - how to?
On 3/2/06, Laurits Søgaard Nielsen [EMAIL PROTECTED] wrote: Hi I have a little R problem. I have created a GLM model in R and now I want to predict some values outside the values I have in the model (extrapolate). myglm - glm( some stuff here) whatever - some-new-hypothetical-data-you-create predict (myglm, newdata=whatever, type=response) I have hints on this in Rtips http://pj.freefaculty.org/R/Rtips.html#7.5 -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Specification decisions in glm and lmer
I have been reviewing GLM and LMER to sharpen up some course notes and would like to ask for your advice. 1. Is there a test that would be used to check whether a particular functional form--say Gaussian, Gamma, or Inverse Gaussian, is more appropriate in a Generalized Linear Model? A theoretical reason to choose one over the other is enough for me, but I've got some skeptical students who want a significance test. I've done some simulation tests and find that the AIC is smaller if you guess the correct distribution, but as long as you have the link and the functional form of the right hand side correct, then your parameter estimates are not horribly affected by the choice of the distribution. If your data is Gamma, for example, the estimates from GLM-Normal are just about as good. Do you think so? A draft of my notes is on my new web site: http://pj.freefaculty.org/ps707/GLM/GammaGLM-01.pdf If you look down to sections 7 and 8, (or the graphs on p. 9 and p. 17), you might see what I mean. I still have work to do on this and if you have pointers, please email me. 2. Suppose you fix a random effects model with lmer and you wonder if the fit is better (I suppose in the sense of anova) than a glm that is the same, except for the lack of a random effect. Is there a test for that? I've been reading the thread in the r-help list from December, 2005, in which people are asking for a standard error or confidence interval for a variance component and the answer seems to be that such a thing is not statistically meaningful. Would you consider an example using survey data about attitudes toward the police in US cities? The variable PLACE indicates which city people live in. We wonder if there should be a random intercept to represent diversity across cities. I want something that works like anova() might. What to do? gl1 - glm ( RE2TRCOP~ AGE + POLINT + HAPPY + EFFCOM + TRNEI +GRPETH + TRBLK + BLKCHIEF + RCOPDIFF+ RCOPRATE + RCOPBKIL + RCRIM3YR + RPCTBLAK, family=binomial(link=logit),data=eldat) summary(gl1) Call: glm(formula = RE2TRCOP ~ AGE + POLINT + HAPPY + EFFCOM + TRNEI + GRPETH + TRBLK + BLKCHIEF + RCOPDIFF + RCOPRATE + RCOPBKIL + RCRIM3YR + RPCTBLAK, family = binomial(link = logit), data = eldat) Deviance Residuals: Min 1Q Median 3Q Max -1.5129 -0.5515 -0.4072 -0.2807 2.7975 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -1.549801 0.666762 -2.324 0.020106 * AGE -0.020215 0.005423 -3.728 0.000193 *** POLINT -0.148035 0.075748 -1.954 0.050666 . HAPPY -0.309656 0.114871 -2.696 0.007024 ** EFFCOM -0.150475 0.088664 -1.697 0.089670 . TRNEI0.376409 0.091701 4.105 4.05e-05 *** GRPETH 0.593949 0.205169 2.895 0.003792 ** TRBLK0.575473 0.114896 5.009 5.48e-07 *** BLKCHIEF-0.233197 0.188521 -1.237 0.216093 RCOPDIFF 0.051088 0.150306 0.340 0.733938 RCOPRATE 0.169789 0.097540 1.741 0.081733 . RCOPBKIL 0.147662 0.089269 1.654 0.098102 . RCRIM3YR 0.189701 0.097590 1.944 0.051913 . RPCTBLAK-0.504626 0.175897 -2.869 0.004119 ** --- Signif. codes: 0 $-1òø***òù 0.001 òø**òù 0.01 òø*òù 0.05 òø.òù 0.1 òø òù 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1364.4 on 1694 degrees of freedom Residual deviance: 1197.5 on 1681 degrees of freedom AIC: 1225.5 Number of Fisher Scoring iterations: 5 me1 - lmer( RE2TRCOP~ AGE + POLINT + HAPPY + EFFCOM + TRNEI +GRPETH + TRBLK + BLKCHIEF + RCOPDIFF+ RCOPRATE + RCOPBKIL + RCRIM3YR + RPCTBLAK + (1 | PLACE) , family=binomial(link=logit),data=eldat, method=Laplace) Loading required package: lattice summary(me1) Generalized linear mixed model fit using Laplace Formula: RE2TRCOP ~ AGE + POLINT + HAPPY + EFFCOM + TRNEI + GRPETH + TRBLK + BLKCHIEF + RCOPDIFF + RCOPRATE + RCOPBKIL + RCRIM3YR + RPCTBLAK + (1 | PLACE) Data: eldat Family: binomial(logit link) AIC BIClogLik deviance 1227.446 1308.977 -598.7229 1197.446 Random effects: GroupsNameVarianceStd.Dev. PLACE (Intercept) 0.00563430.075062 # of obs: 1695, groups: PLACE, 18 Estimated scale (compare to 1) 1.014432 Fixed effects: Estimate Std. Error z value Pr(|z|) (Intercept) -1.5478617 0.6708198 -2.3074 0.0210315 * AGE -0.0202473 0.0054271 -3.7308 0.0001909 *** POLINT -0.1481644 0.0757952 -1.9548 0.0506069 . HAPPY -0.3102757 0.1149718 -2.6987 0.0069609 ** EFFCOM -0.1501713 0.0887251 -1.6925 0.0905419 . TRNEI0.3757690 0.0918107 4.0929 4.261e-05 *** GRPETH 0.5903173 0.2053184 2.8751 0.0040386 ** TRBLK0.5754894 0.1149954 5.0045 5.602e-07 *** BLKCHIEF-0.2275911 0.1935195 -1.1761 0.2395698 RCOPDIFF 0.0483413 0.1541093 0.3137 0.7537626 RCOPRATE 0.1681531 0.1003416 1.6758 0.0937760 . RCOPBKIL 0.1437347 0.0924614 1.5545 0.1200562 RCRIM3YR
[R] Sweave: trouble controlling Design package startup output
Hello, everybody: I'm experimenting more with Sweave, R, and LyX. There's now an entry in the LyX wiki on using R, so anybody can do it! http://wiki.lyx.org/LyX/LyxWithRThroughSweave Now I notice this frustrating thing. I think I've done everything possible to make the Design library start up without any fanfare: echo=F,quiet=T,print=F= library(Design, verbose=F) @ But in the output there is still one page of banner information, which starts like this: Loading required package: Hmisc Hmisc library by Frank E Harrell Jr Type library(help= Hmisc ), ?Overview, or ?Hmisc.Overview ) to see overall documentation. NOTE:Hmisc no longer redefines [.factor to drop unused levels when subsetting. To get the old behavior of Hmisc type dropUnusedLevels(). Attaching package: Hmisc The following object(s) are masked from package:stats : [and so forth] Can you tell me how to make it stop? I've seen this kind of output with a couple of other packages over the past 2 months, but I can't find the files that produced them, so I can't cite the examples. But I don't think it is confined to Design. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] cron job install/update problems: tcltk can't find display (installing e.g., pbatR)
On Fedora Core Linux 4, I have a cron job that causes R to update all packages and install new ones. Lately, I notice in the log that some packages fail to install. These are ones that assume X is running. For example, the pbatR install requires tcltk to be loaded, and then the install fails because in a cron job, there is no DISPLAY environment. I suppose the same happens if you try to install R packages in the console, without X running? Error output is pasted here. I wonder if you can advise me whether this is the kind of thing that can be fixed in the cron job or not. I've verified that pbatR does install interactively (because tcltk does start). If you think this is a pbatR-specific problem, i will contact the author directly. When I have the repos option set, the interactive install does not cause any tcltk widgets to pop up, so I wonder if it is really necessary. * Installing *source* package 'pbatR' ... ** libs g++ -I/usr/lib/R/include -I/usr/local/include -fPIC -O2 -g -pipe -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -m32 -march=i386 -mtune=pentium4 -fasynchronous-unwind-tables -c wait.cpp -o wait.o wait.cpp: In function 'int runCommands()': wait.cpp:77: warning: ignoring return value of 'int system(const char*)', declared with attribute warn_unused_result g++ -shared -L/usr/local/lib -o pbatR.so wait.o -L/usr/lib/R/lib -lR ** R ** save image Loading required package: survival Loading required package: splines Loading required package: tcltk Loading Tcl/Tk interface ... Error in fun(...) : no display name and no $DISPLAY environment variable Error: .onLoad failed in 'loadNamespace' for 'tcltk' Error: package 'tcltk' could not be loaded Execution halted ERROR: execution of package source for 'pbatR' failed ** Removing '/usr/lib/R/library/pbatR' ** Restoring previous '/usr/lib/R/library/pbatR' -- Here's the R code that runs from the Cron job # Paul Johnson [EMAIL PROTECTED] 2005-08-31 # This should update and then install all packages, except for # ones I exclude because they don't work or we don't want them. #options(repos = http://lib.stat.cmu.edu/R/CRAN/;) options(repos = http://cran.cnr.berkeley.edu/;) #failPackages is the black list. # Things that don't build cleanly with FC4, but I # hope it will build someday soon. failPackages - c(deldir,frailtypack,fSeries,fCalendar,fExtremes,fPortfolio,hmm.discnp,knncat,labdsv,survrec, SciViews,RGrace,uroot,fMultivar,fOptions,gcmrec,rcom,Rlsf) #list of all currently installed packages installedPackages - rownames (installed.packages() ) #do any installed packages need removal because they are on the blacklist? needRemoval - installedPackages %in% failPackages # remove any blacklisted packages if they are already installed. if (sum(needRemoval) 0) remove.packages(installedPackages[needRemoval] ) #update the ones you want to keep update.packages(ask=F, checkBuilt=T) #get list of all new packages on CRAN theNew - new.packages() #do any of the new packages belong to the black list? shouldFail - theNew %in% failPackages #install non blacklisted packages that are in theNew list if (sum(!shouldFail) 0) install.packages( theNew[!shouldFail],dependencies=T) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Current state of support for BUGS access for Linux users?
Thanks, Uwe that clears up why I can't make R2WinBUGs work with OpenBUGS and WinBUGS1.5 :) Both work pretty good with Wine in a GUI. I noticed that when I tried rbugs, it does succeed in starting WinBUGS GUI, but then nothing happens. I'll get WinBUGS1.4 and see what happens. In the meanwhile, I'm going to t ry to see what BRugs is good for. In Linux, when I try to install BRugs, the install fails with an error saying that, at the current time, BRugs works only in Windows. * Installing *source* package 'BRugs' ... Package 'BRugs' currently only works under Windows.\nIt is supposed to work under Linux in future releases. I'd like to stop that check and see what happens! The way I read the sourcecode from OpenBUGS and BRugs, I need to replace the windows dll install and instead put in an so file (as in OpenBUGS). If anybody has done this, please let me know of your experience. On 1/17/06, Uwe Ligges [EMAIL PROTECTED] wrote: Paul Johnson wrote: Greetings: I'm going to encourage some students to try Bayesian ideas for hierarchical models. I want to run the WinBUGS and R examples in Tony Lancaster's An Introduction to Modern Bayesian Econometrics. That features MS Windows and bugs from R2WinBUGS. Today, I want to ask how people are doing this in Linux? I have found a plethora of possibilities, some of which are not quite ready, some of which work only under MS Windows. Right now I just want to know what actually works. Here's where I stand now in Fedora Core 4 Linux. 1. OpenBUGS-2.1.1 runs in Linux. I can run linbugs (the console version similar to the old BUGS) and also I can run--under wine--the newest version of winbugs.exe that is circulated with OpenBUGS. As far as I can tell, the graphical interface in wine/winbugs works in almost all elements. A few things seem not quite right in the GUI (can't initialize more than one chain, difficult to specify variables for monitoring), but it does work. It is easier to install and work with OpenBUGS's version of winbugs.exe than with Winbugs-1.4 because the Open version does not have that annoying license registration and winbugs.exe is not wrapped inside an installation script. I'm a little confused about WinBUGS versions because the BRugs documents http://www.biostat.umn.edu/~brad/software/BRugs/BRugs_install.html refer to WinBUGS-1.5, which refers to http://www.biostat.umn.edu/~brad/software/BRugs/WinBUGS15.zip, which can be downloaded without any of the registration steps, but WinBUGS15 is not mentioned in the WinBUGS site (where 1.4.1 appears to be the newest). Supposing I get the winbugs.exe question settled: 2. How to most dependably send jobs from R to linbugs or winbugs.exe? The BRugs package is preferred? For a long time, R2WinBUGS was Windows-only, but toward the end of last fall I noticed that R2WinBUGS now does compile and install under R in Linux. however, its help still says: SystemRequirements: WinBUGS 1.4 on Windows I'd appreciate any advice. [resend to less recipients in order to save Martin's spare time to approve message; CCing Andrew Thomas, Bob O'Hara and Sibylle Sturtz separately] Re BUGS: WinBUGS-1.5 never got really released, AFAIK - Andrew or Bob might want to correct me. It has been renamed to OpenBUGS. The current version is the GPL'ed OpenBUGS 2.1.1 available from http://mathstat.helsinki.fi/openbugs/. Re R packages: - R2WinBUGS is compatible with WinBUGS-1.4.x only, its newest version can speak with WinBUGS under wine thanks to user contributions. But it still depends on WinBUGS-1.4.x, hence Windows only (considering wine as Windows). - BRugs contains the BRugs interface, R functions and the whole OpenBUGS installation. Unfortunately, due to serious compiler problems, we were not able to get a Linux version running using the interface. Hence it was not possible to release any non-Windows version up to now. I haven't tested BRugs under wine yet (in which case R has to run under wine as well, of course) ... and I do not know if there are any serious performance penalties. Note that even in the long term, OpenBUGS will only run on x86 based platforms. Due to the much more flexibile interface, I prefer BRugs. BTW: Real programmers won't consider R2WinBUGS to be an interface at all - it might be useful, though. ;-) Uwe Ligges -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read
Re: [R] Current state of support for BUGS access for Linux users?
[034DH] .h Services.StdHook[0101E380H] HostWindows.Idle [4A86H] .focus BOOLEAN FALSE .tick Controllers.TickMsg Fields .w HostWindows.Window NIL HostMenus.TimerTick [3422H] .lParam INTEGER 0 .opsControllers.PollOpsMsg Fields .wParam INTEGER 1 .wndINTEGER 65574 Kernel.Try [3A61H] .a INTEGER 65574 .b INTEGER 1 .c INTEGER 0 .h PROCEDURE HostMenus.TimerTick HostMenus.ApplWinHandler [3841H] .Proc PROCEDURE NIL .hitBOOLEAN Undefined70 .lParam INTEGER 0 .messageINTEGER 275 .resINTEGER 1180843108 .s ARRAY 256 OF SHORTCHAR 2X ... .w INTEGER 538416909 .wParam INTEGER 1 .wndINTEGER 65574 system (pc=465EDB29H, fp=7FC8FB00H) system (pc=465EE418H, fp=7FC8FB3CH) system (pc=465F1DF0H, fp=7FC8FB80H) system (pc=465BD031H, fp=7FC8FBB0H) HostMenus.Loop [3BDEH] .done BOOLEAN FALSE .f SET {0..5} .n INTEGER 0 .resINTEGER 0 .w HostWindows.Window NIL Kernel.Start [2B8CH] .code PROCEDURE HostMenus.Loop 3. Accessing OpenBUGS with wine from rbugs gets as far as opening the OpenBUGS GUI, but nothing ever appears in the log file and there are no computations going on (according to system monitors, anyway), but everything in OpenBUGS just seems stuck. pj On 1/17/06, Uwe Ligges [EMAIL PROTECTED] wrote: Paul Johnson wrote: Thanks, Uwe that clears up why I can't make R2WinBUGs work with OpenBUGS and WinBUGS1.5 :) Both work pretty good with Wine in a GUI. I noticed that when I tried rbugs, it does succeed in starting WinBUGS GUI, but then nothing happens. I'll get WinBUGS1.4 and see what happens. In the meanwhile, I'm going to t ry to see what BRugs is good for. In Linux, when I try to install BRugs, the install fails with an error saying that, at the current time, BRugs works only in Windows. Yes, I have added that particular line in order not to confuse users, and I thought I told you in my last message that it works only under Windows. * Installing *source* package 'BRugs' ... Package 'BRugs' currently only works under Windows.\nIt is supposed to work under Linux in future releases. I'd like to stop that check and see what happens! OK, just remove the configure file. The way I read the sourcecode from OpenBUGS and BRugs, I need to replace the windows dll install and instead put in an so file (as in OpenBUGS). Yes, it is already shipped, and the infrastructure in the package is ready (hopefully), but the brugs.so file does not work as expected. If anybody has done this, please let me know of your experience. Yes, several tried, among them Andrew Thomas and Uwe Ligges, and then I invited Andrew Thomas to Dortmund and we tried together (I have to admit that I was clueless all the time and in fact Andrew tried). Andrew's conclusion was that there is some compiler problem on Linux with the BlackBox framework (Component Pascal compiler from Oberon microsystems) in which WinBUGS/OpenBUGS is written in ... If you get it to work, please let us know! Uwe -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Current state of support for BUGS access for Linux users?
Before I forget, found the working recipe for rbugs. My mistake before was not realzing that the n.iter value is total iterations, including burnin, and so by setting n.iter=1000 and n.burnin=1000, I was leaving 0 iterations for the updates. ### Paul Johnson 2006-01-18. This does work! ### Works in Fedora Core 4 Linux with wine-0.9.5 and WinBUGS-1.4.1 ### (that's patched Winbugs 1.4). The first part is from Andrew Gelman's ### web site http://www.stat.columbia.edu/~gelman/bugsR/, but is basically ### same as rbugs example code. schools - read.table (schools.dat, header=T) J - nrow(schools) y - schools$estimate sigma.y - schools$sd data - list (J, y, sigma.y) inits - function() {list (theta=rnorm(J,0,100), mu.theta=rnorm(1,0,100), sigma.theta=runif(1,0,100))} parameters - c(theta, mu.theta, sigma.theta) library(rbugs) # Note that n.iter-n.burnin = N of observations collected in CODA schools.sim - rbugs ( data, inits, parameters, schools.bug, n.chains = 1, n.iter = 3, n.burnin = 1000, n.thin = 1, workingDir=/home/pauljohn/.wine/fake_windows/temp, bugs=/usr/local/share/WinBUGS14/WinBUGS14.exe, bugsWorkingDir=z:/home/pauljohn/.wine/fake_windows/temp, useWine=TRUE, wine=/usr/bin/wine, debug=F) myResults - getBugsOutput(workingDir = /home/pauljohn/.wine/fake_windows/temp, n.chains=1) # Creates an ordinary data frame: str(myResults) library(coda) # Another way to retrieve text file data. using coda library c1 - read.coda(coda1.txt, /home/pauljohn/.wine/fake_windows/temp/codaIndex.txt) # That creates a much more complicated mcmc data structure, however. # But the output is pretty nice! And there are diagnostic functions summary(c1) heidel.diag(c1) geweke.diag(c1) ### need more chains to run gelman.diag ###Note R2WinBUGS has a read.bugs command very similar. I suspect it ### work like a simple wrapper around the read.coda function c2 - read.bugs(/home/pauljohn/.wine/fake_windows/temp/coda1.txt) heidel.diag(c2) geweke.diag(c2) pj On 1/17/06, Paul Johnson [EMAIL PROTECTED] wrote: Thanks! Let me ask this question again. Clearly, if you experts can't make R talk to WinBUGS, then I can't either. So, If bugs() doesn't work, What is the next best thing? With wine, I can run OpenBUGS and WinBUGS, but I cannot send jobs from R to a BUGS program (still trying, some people say they have seen rbugs work in Linux). I want to follow along with the instructions In Prof Gelman's site for R2WinBUGS. OR the examples in the rbugs package. I just noticed that rbugs works by writing the data, model, inits, and so forth into text files, along with a script that drives WinBUGS. Although there is something wrong with the script file that causes a fatal WinBUGS crash, I have just manually started the GUI and pointed and clicked my way to a working set of BUGS estimates (see details on script flaw and trap below). That makes me feel a bit confident that I will be able to create working BUGS sims and get results. I need to learn how to bring into R from CODA output. In the WinBUGS output, I find a codaIndex.txt file that looks promising. Now, about efforts to run WinBUGS from within R. In case other people are trying this, here is what I learned. 1. bugs() from R2WinBUGS to WinBUGS14 under wine-0.9.5 is not able to start WinBUGS14.exe I've just tried the newest example from http://www.stat.columbia.edu/~gelman/bugsR/ with WinBUGS14 under wine. WinBUGS never shows on the screen before the error appears: library(R2WinBUGS) schools - read.table (schools.dat, header=T) J - nrow(schools) y - schools$estimate sigma.y - schools$sd data - list (J, y, sigma.y) inits - function() {list (theta=rnorm(J,0,100), mu.theta=rnorm(1,0,100), sigma.theta=runif(1,0,100))} parameters - c(theta, mu.theta, sigma.theta) schools.sim - bugs (data, + inits, + parameters, + schools.bug, + bugs.directory = /usr/local/share/WinBUGS14/, + n.chains=3, + n.iter=1000, + useWINE= T, + WINE = /usr/bin/wine + ) Loading required package: tools Error in pmatch(x, table, duplicates.ok) : argument is not of mode character Humphf! I tried running this under debug, but can't understand the output. I get through about 8 steps, when bugs.script() seems to cause the error: Browse[1] n debug: bugs.script(parameters.to.save, n.chains, n.iter, n.burnin, n.thin, bugs.directory, new.model.file, debug = debug, is.inits = !is.null(inits), bin = bin, DIC = DIC, useWINE = useWINE) Browse[1] n Error
[R] Current state of support for BUGS access for Linux users?
Greetings: I'm going to encourage some students to try Bayesian ideas for hierarchical models. I want to run the WinBUGS and R examples in Tony Lancaster's An Introduction to Modern Bayesian Econometrics. That features MS Windows and bugs from R2WinBUGS. Today, I want to ask how people are doing this in Linux? I have found a plethora of possibilities, some of which are not quite ready, some of which work only under MS Windows. Right now I just want to know what actually works. Here's where I stand now in Fedora Core 4 Linux. 1. OpenBUGS-2.1.1 runs in Linux. I can run linbugs (the console version similar to the old BUGS) and also I can run--under wine--the newest version of winbugs.exe that is circulated with OpenBUGS. As far as I can tell, the graphical interface in wine/winbugs works in almost all elements. A few things seem not quite right in the GUI (can't initialize more than one chain, difficult to specify variables for monitoring), but it does work. It is easier to install and work with OpenBUGS's version of winbugs.exe than with Winbugs-1.4 because the Open version does not have that annoying license registration and winbugs.exe is not wrapped inside an installation script. I'm a little confused about WinBUGS versions because the BRugs documents http://www.biostat.umn.edu/~brad/software/BRugs/BRugs_install.html refer to WinBUGS-1.5, which refers to http://www.biostat.umn.edu/~brad/software/BRugs/WinBUGS15.zip, which can be downloaded without any of the registration steps, but WinBUGS15 is not mentioned in the WinBUGS site (where 1.4.1 appears to be the newest). Supposing I get the winbugs.exe question settled: 2. How to most dependably send jobs from R to linbugs or winbugs.exe? The BRugs package is preferred? For a long time, R2WinBUGS was Windows-only, but toward the end of last fall I noticed that R2WinBUGS now does compile and install under R in Linux. however, its help still says: SystemRequirements: WinBUGS 1.4 on Windows I'd appreciate any advice. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] multiple lowess line in one plot
It appears to me lowess has no subset or strata option. The brute-force way (which my students like best) is just to create 4 columns, one for each group (with unstack or such) and then run one lines(lowess()) command for each one. I think there is a bit more art in using by, which will create subsets on the fly for you. Here is a self contained example that I just worked out to illustrate. grp is the grouping factor, x and y are just illustrative data. The by function creates the data subsets and they are accessed in the FUN as sub1. x - rnorm(100) grp - gl(4,25) y - rpois(100, lambda=4) mydf - data.frame(x,y,grp) with(mydf, plot(x,y)) by (mydf, list(grp), function(sub1) lines(lowess(sub1$x, sub1$y))) Hope that helps On 1/4/06, Dean Sonneborn [EMAIL PROTECTED] wrote: I'm using this code to plot a smoothed line. These two columns of data really represent 4 groups and I'd like to plot a separate line for each group but have them all in the same plot. The R-Docs for lowess do not seem to indicate some type of GROUPS=var_name option. What would be the syntax for this? plot(AWGT ~ lipid ) lines(lowess(lipid , AWGT, f=.8)) -- Dean Sonneborn, MS Programmer Analyst Department of Public Health Sciences University of California, Davis (530) 754-9516 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Hmisc latex function
Does anybody suggest a work-around this problem? pj Marc Schwartz (via MN) wrote: On Wed, 2005-10-12 at 08:33 -0500, Charles Dupont wrote: Marc Schwartz (via MN) wrote: On Tue, 2005-10-11 at 10:01 -0400, Rick Bilonick wrote: I'm using R 2.2.0 on an up-to-date version of Fedora Core 4 with the latest version of Hmisc. When I run an example from the latex function I get the following: x - matrix(1:6, nrow=2, dimnames=list(c('a','b'),c('c','d','enLine 2'))) x c d enLine 2 a 1 35 b 2 46 latex(x) # creates x.tex in working directory sh: line 0: cd: “/tmp/Rtmpl10983”: No such file or directory This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4) entering extended mode ! I can't find file `“/tmp/Rtmpl10983/file643c9869”'. * “/tmp/Rtmpl10983/file643c9869” Please type another input file name: q (/usr/share/texmf/tex/latex/tools/q.tex LaTeX2e 2003/12/01 Babel v3.8d and hyphenation patterns for american, french, german, ngerman, b ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch, esperanto, e stonian, finnish, greek, icelandic, irish, italian, latin, magyar, norsk, polis h, portuges, romanian, russian, serbian, slovak, slovene, spanish, swedish, tur kish, ukrainian, nohyphenation, loaded. File ignored xdvi-motif.bin: Fatal error: /tmp/Rtmpl10983/file643c9869.dvi: No such file. How can I fix this? Rick B. I get the same results, also on FC4 with R 2.2.0. I am cc:ing Frank here for his input, but a quick review of the code and created files suggests that there may be conflict between the locations of some of the resultant files during the latex system call. Some files appear in a temporary R directory, while others appear in the current R working directory. For example, if I enter the full filename: /tmp/RtmpC12100/file643c9869.tex at the latex prompt, I get: latex(x) sh: line 0: cd: “/tmp/RtmpC12100”: No such file or directory This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4) entering extended mode ! I can't find file `“/tmp/RtmpC12100/file643c9869”'. * “/tmp/RtmpC12100/file643c9869” Please type another input file name: *** loading the extensions datasource /tmp/RtmpC12100/file643c9869.tex (/tmp/RtmpC12100/file643c9869.tex LaTeX2e 2003/12/01 Babel v3.8d and hyphenation patterns for american, french, german, ngerman, b ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch, esperanto, e stonian, finnish, greek, icelandic, irish, italian, latin, magyar, norsk, polis h, portuges, romanian, russian, serbian, slovak, slovene, spanish, swedish, tur kish, ukrainian, nohyphenation, loaded. (/usr/share/texmf/tex/latex/base/report.cls Document Class: report 2004/02/16 v1.4f Standard LaTeX document class (/usr/share/texmf/tex/latex/base/size10.clo)) (/usr/share/texmf/tex/latex/geometry/geometry.sty (/usr/share/texmf/tex/latex/graphics/keyval.sty) (/usr/share/texmf/tex/latex/geometry/geometry.cfg)) No file file643c9869.aux. [1] (./file643c9869.aux) ) Output written on file643c9869.dvi (1 page, 368 bytes). Transcript written on file643c9869.log. xdvi-motif.bin: Fatal error: /tmp/RtmpC12100/file643c9869.dvi H, It works for me. Interesting. It almost looks like the temp dir is not being created, but thats not possible because R does that. It might be a Unicode issue with you system shell. Can you run this statement in R sys(paste('cd',dQuote(tempdir()),;, echo Hello BOB test.test, ;,cat test.test)) What version of Hmisc are you using? What local are you using? Charles Hmisc version 3.0-7, Dated 2005-09-15, which is the latest according to CRAN. sys(paste('cd',dQuote(tempdir()),;, + echo Hello BOB test.test, + ;,cat test.test)) sh: line 0: cd: “/tmp/RtmpGY5553”: No such file or directory [1] Hello BOB From a bash console: $ cd /tmp/RtmpGY5553 $ pwd /tmp/RtmpGY5553 $ locale LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_ALL= On the creation of the sys() call, it looks like the backquotes are causing the problem: paste('cd',dQuote(tempdir())) [1] cd “/tmp/RtmpGY5553” From a bash shell: $ cd “/tmp/RtmpGY5553” bash: cd: “/tmp/RtmpGY5553”: No such file or directory $ cd /tmp/RtmpGY5553 $ pwd /tmp/RtmpGY5553 According to ?dQuote: By default, sQuote and dQuote provide undirectional ASCII quotation style. In a UTF-8 locale (see l10n_info), the Unicode directional quotes are used. The See Also points to shQuote for quoting OS commands. HTH, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Paul E. Johnson
[R] Predicting and Plotting hypothetical values of factors
Last Friday, I noticed that it is difficult to work with regression models in which there are factors. It is easier to do the old fashioned thing of coding up dummy variables with 0-1 values. The predict function's newdata argument is not suited to insertion of hypothetical values for the factor, whereas it has no trouble with numeric variables. For example, if one uses a factor as a predictor in a logistic regression, then it is tough to plot the S-shaped curve that describes the probabilities. Here is some code with comments describing the difficulties that we have found. Are there simpler, less frustrating approaches? - # Paul Johnson pauljohn at ku.edu 2005-11-17 # factorFutzing-1.R myfac - factor(c(1.1, 4.5, 1.1, 1.1, 4.5, 1.1, 4.5, 1.1)) y - c(0,1,0,1,0,0,1,0) mymod1 -glm (y~myfac, family=binomial) p1 - predict(mymod1, type=response) plot(myfac,p1) # Wait, that's not the picture I wanted. I want to see the # proverbial S-shaped curve! # # The contrast variable that R creates is coded 0 and 1. # I'd like to have a sequence from 0 to 1 (seq(0,1,length.out=10)) and # get predicted values for each. That would give the S-shaped curve. # But the use of predict does not work because the factor has 2 # values and the predict function won't take my new data: predict(mymod1, newdata=data.frame(myfac=seq(0,1,length.out=8))) # Error: variable 'myfac' was fitted with class factor but class numeric was supplied # In addition: Warning message: # variable 'myfac' is not a factor in: model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) # Isn't there an easier way than this? c1 - coef(mymod1) myseq - seq(0,1, length.out=10) newdf - data.frame(1, myseq) linearPredictor - as.matrix(newdf) %*% c1 p2 - 1/ (1 + exp(-linearPredictor)) # But there is a big disaster if you just try the obvious thing # of plotting with # lines(myseq,p2) # The line does not show up where you hope in the plot. # The plot appears to have the x-axis on the scale # 1.1 to 4.5, So in order to see the s-shaped curve, it appears we have # to re-scale. However, this is a big illusion. To see what I mean, # do points(2,.4) # you expected to see the point appear at (2,.4), but in the plot # it appears at (4.5,.4). Huh? # The actual values being plotted are the integer-valued levels that # R uses for factors, the numbers you get from as.numeric(myfac). # So it is only necessary to re-scale the sequence by adding one. myseq2 - 1 + myseq lines( myseq2, p2, type=l ) #Its not all that S-shaped, but you have to take what you can get! :) - -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Need advice about models with ordinal input variables
Dear colleagues: I've been storing up this question for a long time and apologize for the length and verbosity of it. I am having trouble in consulting with graduate students on their research projects. They are using surveys to investigate the sources of voter behavior or attitudes. They have predictors that are factors, some ordered, but I am never confident in telling them what they ought to do. Usually, they come in with a regression model fitted as though these variables are numerical, and if one looks about in the social science literature, one finds that many people have published doing the same. I want to ask your advice about some cases. 1. An ordered factor that masquerades as a numerical interval level score. In the research journals, these are the ones most often treated as numerical variables in regressions. For example: Thermometer scores for Presidential candidates range from 0 to 100 in integer units. What's a better idea? In an OLS model with just one input variable, a plot will reveal if there is a significant nonlinearity. One can recode the assigned values to linearize the final model or take the given values and make a nonlinear model. In the R package acepack I found avas, which works like a rubber ruler and recodes variables in order to make relationships as linear and homoskedastic as possible. I've never seen this used in the social science literature. It works like magic. Take an ugly scatterplot and shazam, out come transformed variables that have a beautiful plot. But what do you make of these things? There is so much going on in these transformations that interpretation of the results is very difficult. You can't say a one unit increase in x causes a b increase in y. Furthermore, if the model is a survival model, a logistic regression, or other non-OLS model, I don't see how the avas approach will help. I've tried fiddling about with smoothers, treating the input scores as if they were numerical. I got this idea from Prof. Harrell's Regression Modeling Strategies. In his Design package for R, one can include a cubic spline for a variable in a model by replacing x with rcs(x). Very convenient. If the results say the relationship is mostly linear, then we might as well treat the input variable as a numerical score and save some degrees of freedom. But if the higher order terms are statistically significant, it is difficult to know what to do. The best strategy I have found so far is to calculate fitted values for particular inputs and then try to tell a story about them. 2. Ordinal variables with less than 10 values. Consider variables like self-reported ideology, where respondents are asked to place themselves on a 7 point scale ranging from very conservative to very liberal. Or Party Identification on a 7 point scale, ranging (in the US) from Strong Democrat to Strong Republican. It has been quite common to see these thrown into regression models as if they were numerical. I've sometimes found it useful to run a regression treating them as unordered factors, and then I attempt to glean a pattern in the coefficients. If the parameter estimates step up by a fixed proportion, then one might think there's no damage from treating them as numerical variables. Yesterday, it occurred to us that there should be a signifance test to determine if one looses predictive power by replacing the factor-treatment of x with x itself. Is there a non-nested model test that is most appropriate? 3. Truly numericals variable that are reported as grouped ordinal scales. THese variables are aweful in many ways. Income is often reported in a form like this: 1) Less than 2 2) 2 to 35000 3) 35001 to 5 4) 50001 to 10 5) above 10 Education often appears in a form that has 1) 8 years or less 2) 9 years 3) 10 years 4) 11 years 5) 12 years 6) some college completed 7) undergraduate degree completed 8) graduate degree completed These predictors pose many problems. We have dissimilar people grouped together, so there are errors in variables and it seems obvious that the scores should be recoded somehow to reflect the substance of the differences among groups. But how? 4. Ordered variables with a small number of scores. For example, has your economic situation been 1) worse 2) same 3) better or how do you feel when you see the American flag? 1) no effect 2) OK 3) great 4) extatic Anyway, in an R model, I think the right thing to do is to enter them into a regression with as.ordered(x). But I don't know what to say about the results. Has anybody written an idiots guide to orthogonal polynomials? Aside from calculating fitted values, how do you interpret these things? Is there ever a point when you would say we should treat that as a numerical variable with scores 1-2-3-4 rather than as an ordered factor? If you have advice, I would be delighted to hear it. -- Paul E.
Re: [R] Select varying LS digits in long numbers?
Maybe this is just the brute force you want to avoid, but it works to coerce the integer into a string, and then back to a number. Could reduce number of lines by nesting functions, of course. y - 1234131431 n - 3 ychar - as.character(y) ydigits - nchar(ychar) result - as.numeric ( substr(ychar, ydigits- n +1, ydigits) ) pj (Ted Harding) wrote: Hi Folks, I'm trying to find a neat solution to an apparently simple problem, but one which turns out to be a bit more intricate and tricky than one might expect. Suppose I have numbers given to a large number of digits. For example 1234567021 where (though I don't know this beforehand) only the last 3 digits will be varying (and all 3 will vary). What I want is, give a vector x of such numbers, to extract the minimal set of final digits which will include the varying digits (i.e. in this case the last 3 digits). And there may be a decimal point somewhere along the line (though again I won't know where, nor whether). -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] controlling usage of digits scientific notation in R plots; postscript margins
Dear R users: I assigned students to make some graphs and I'm having trouble answering some questions that they have. We are all working on R 2.1 on Fedora Core Linux 4 systems. 1. In the plot, the axis is not labeled by numbers, but rather scientific notation like -2e+08 or such. We realize that means -200,000,000. We want to beautify the plot. We would rather just print out the whole, big number. But if we can't have that, we would like something more beautiful and mathematical, such as 8 -2.0x10 Once the numbers are big, R automagically switches to scientific notation, and turning the values horizontal does not help on the y axis. Example: x - rnorm(100,mean=10,sd=1) y - rnorm(100,mean=100,sd=1000) plot(x,y,axes=F) axis(2,las=2) #turns y axis numbers perpendicular to y axis axis(1) # x axis I realize a person could just divide all numbers by 1 million and then have a graph with -200 and an annotation (in millions). On the x axis, we'd even settle to just label the two outermost points with full numerical values, and then have only ticks between. I was looking for some way to use axis() to draw an unlabeled axis and then put in text labels after that. However, I can't understand how to get the values of the tick marks placed by axis from the figure in order to place text by some tick marks. 2. Some students make postscript figures that fit just right into LaTeX documents, while some make figures that have huge, inches-and-inches of margins around the outside. I'm not watching how they make these figures all the time, but I think I'm figuring out the cause of big margins. Is this right: the margins issue relates to the use of the onefile=F option in a dev.copy command? I think the figures turn out properly with this: dev.copy(postscript,file=myfig.eps,height=5,width=5,horizontal=F,onefile=F) dev.off() If you mistakenly omit the onefile=F option, the margins stretch out from the center where the figure is placed to the edges of an entire sheet of paper. In other words, the default settings for margins in inches that I see in output from par() ... $mai [1] 1.0066821 0.8092935 0.8092935 0.4145162 ... have no effect if a person forgets the onefile=F option. We fiddled a while, trying to force margins down, par(mai=c(0,0,0,0)), but no matter what we did, the figures still had the massive margins. We don't have these margin problems with other devices. Margins in jpeg or png pictures are sized appropriately. -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] controlling where *.Rout gets printed. Possible?
OK, my journey to make lab machines automagically install update all desirable R packages is nearing an end! The only question I have now is this: How can I control where the system prints the *.Rout file that is created automatically when the R batch program runs. In man R I don't find any information about it. When the cron job runs R_installAll.sh (see below), I'd like to re-direct it to someplace that ordinary users can read. Here's the shell script I will schedule with cron R_installAll.sh -- #!/bin/bash R CMD BATCH /usr/local/bin/R_installAll.R -- And here's the R program -R_installAll.R- # Paul Johnson pauljohn _AT_ ku.edu 2005-08-31 # This should update and then install all packages, except for # ones I exclude because they don't work or we don't want them. options(repos = http://lib.stat.cmu.edu/R/CRAN/;) update.packages(ask=F) theNew - new.packages() failPackages - c(BRugs,GDD,gtkDevice,gap,gnomeGUI,mimR,ncdf,pathmix,rcdd,rgdal,rpvm, Rmpi,RQuantLib,RMySQL, RNetCDF,RODBC,ROracle,RScaLAPACK,rsprng,RWinEdt,taskPR) shouldFail - theNew %in% failPackages install.packages( theNew[!shouldFail],dependencies=T) # VGAM is not in CRAN yet, but Zelig wants it. # install.packages(VGAM, CRAN=http://www.stat.auckland.ac.nz/~yee;); update.packages(CRAN=http://www.stat.auckland.ac.nz/~yee;) -- -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Advice about system for installing updating all R package in a Linux Lab?
Good day: I'm administering 6 linux systems (FC4) in a student lab and worry that users may want packages that are not installed. I get tired of adding them one by one. Then I happened upon this page http://support.stat.ucla.edu/view.php?supportid=30 about installing all R packages from CRAN. That did not run as it was, but after some fiddling I arrived at the following script, which does run and it builds many packages and reports failures on the rest: #R_installAll.R options(repos = http://lib.stat.cmu.edu/R/CRAN/;) update.packages(ask=F) x - packageStatus(repositories=http://cran.r-project.org/src/contrib;) st - x$avai[Status] install.packages(rownames(st)[which(st$Status==not installed)], dependencies=T) If I run that in batch mode (as root, of course) R CMD BATCH R_installAll.R It produces some informative output. Some packages don't build because they are for Windows. As Prof Ripley mentioned recently, some packages don't build because of gcc-4.0.1. Some fail because I don't have some requisite libraries installed. I try to deduce which FC packages may be used to fix that and iterate the process now and then. But, for the most part, the packages to be OK (as far as I can tell). The output of a recent update is posted on the net here, in case you are interested to see (this lists the ones that don't build plus the successful updates): http://lark.cc.ku.edu/~pauljohn/software/R/R_installAll.Rout I can't see how this does any damage, since the packages that don't build are very graceful about erasing themselves, and the ones that do build are automatically available for the users. Can you see any downside to scheduling this process to run as a cron job, say once per week, to keep packages up to date? -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] LyX and Sweave
I just wanted to point out that I was there first :) on the Lyx List (Nov 2004): http://www.mail-archive.com/lyx-users@lists.lyx.org/msg36262.html Perhaps somebody who is trying to put all of this together can benefit from both sets of explanations. pj [EMAIL PROTECTED] wrote: On Mon, 25 Jul 2005 14:12:41 +0200, Gorjanc Gregor (GG) wrote: Hello R-users! I have tried to use Sweave within LyX* and found two ways to accomplish this. I have attached LyX source file for both ways as well as generated PDFs. I have copied Gregor's files at http://www.ci.tuwien.ac.at/~leisch/Sweave/LyX for those who didn't get the attachments. LyX looks actually much better and stable then when I last had a look a couple of years ago. -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Time Series Count Models
Dear Brett: There are books for this topic that are more narrowly tailored to your question. Lindsey's Models for Repeated Measurements and Diggle, et al's Analysis of Longitudinal Data. Lindsey offers an R package on his web site. If you dig around, you will find many modeling papers on this, although in my mind none coalesced into a completely clear path such as throw in these variables and you will get the right estimates. The problem, as you will see, is that there are many possible mathematical descriptions of the idea that there is time dependence in a count model. My political science colleagues John Williams and Pat Brandt published 2 articles on time series with counts. My favorite is the second one here. There is R code for the Pests model. http://www.utdallas.edu/~pbrandt/pests/pests.htm Brandt, Patrick T., John T. Williams, Benjamin O. Fordham and Brian Pollins. 2000. Dynamic Modelling For Persistent Event Count Time Series. American Journal of Political Science 44(4): 823-843. Brandt, Patrick T. and John T. Williams. 2001. A Linear Poisson Autoregressive Model: the Poisson AR(p) Model. Political Analysis 9(2): 164-184. I worked really hard on TS counts a while ago because a student was trying that. If you look at J Lindsay's book Models for Repeated Measures you will make some progress on understanding his method kalcount. That's in the repeated library you get from his web site. Here are the notes I made a couple of years ago http://lark.cc.ku.edu/~pauljohn/stats/TimeSeries/ Look for files called TSCountData*.pdf. It all boils down to the fact that you can't just act like it is an OLS model and throw Y_t-1 or something like that on the right had side. Instead, you have to think in a more delicate way about the process you are modeling and hit it from that other direction. Here are some of the articles for which I kept copies. U. Bokenholt, Mixed INAR(1) Poisson regression models Journal of Econometrics, 89 (1999): 317-338 A.C. Harvey and C. Fernandes, Time Series Models for Count or Qualitative Observations, Journal of Business Economic Statistics, 4 (1989): 407- I recall liking this one a lot J E Kelsall and Scott Zeger and J M Samet Frequency Domain Log-linear Models; air pollution and mortality Appl. Statis 48 1999 331-344. Good luck, let me know what you find out. pj Brett Gordon wrote: Hello, I'm trying to model the entry of certain firms into a larger number of distinct markets over time. I have a short time series, but a large cross section (small T, big N). I have both time varying and non-time varying variables. Additionally, since I'm modeling entry of firms, it seems like the number of existing firms in the market at time t should depend on the number of firms at (t-1), so I would like to include the lagged cumulative count. My basic question is whether it is appropriate (in a statistical sense) to include both the time varying variables and the lagged cumulative count variable. The lagged count aside, I know there are standard extensions to count models to handle time series. However, I'm not sure if anything changes when lagged values of the cumulative dependent variable are added (i.e. are the regular standard errors correct, are estimates consistent, etc). Can I still use one of the time series count models while including this lagged cumulative value? I would greatly appreciate it if anyone can direct me to relevant material on this. As a note, I have already looked at Cameron and Trivedi's book. Many thanks, Brett __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] shared library configuration; Gnome GUI
Hello, everybody: On a Fedora Core 3 Linux system, I built R-2.1 using an updated version of the spec file that was used to make the RPMs for version 2.0.1 on the CRAN system. The build was fine, and packages updates perfectly. Thanks! Then I got curious about the package gnomeGUI. While trying to build that, I see errors = * Installing *Frontend* package 'gnomeGUI' ... Using R Installation in R_HOME=/usr/lib/R R was not built as a shared library Need a shared R library ERROR: configuration failed for package 'gnomeGUI' === So then I look back at re-building R, and see ./configure --help I see these two items that seem to contradict each other. Why is the first defaulted to no and the second one yes? What's the difference? --enable-R-shlibbuild R as a shared library [no] [...snip...] --enable-shared[=PKGS] build shared libraries [default=yes] I built with --enable-R-shlib and all seemed fine. Anyway, it turns out it was all for nothing, because the Gnome package wants the Gnome-1.4 libraries, whereas I have 2.0X. So, I'm going to forget about gnomeGUI, but I wonder: did I do any harm by building R with the non-default --enable-R-shared? Can it potentially break something? As far as I can see, new R runs great. -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lists: removing elements, iterating over elements,
I'm writing R code to calculate Hierarchical Social Entropy, a diversity index that Tucker Balch proposed. One article on this was published in Autonomous Robots in 2000. You can find that and others through his web page at Georgia Tech. http://www.cc.gatech.edu/~tucker/index2.html While I work on this, I realize (again) that I'm a C programmer masquerading in R, and its really tricky working with R lists. Here are things that surprise me, I wonder what your experience/advice is. I need to calculate overlapping U-diametric clusters of a given radius. (Again, I apologize this looks so much like C.) ## Returns a list of all U-diametric clusters of a given radius ## Give an R distance matrix ## Clusters may overlap. Clusters may be identical (redundant) getUDClusters -function(distmat,radius){ mem - list() nItems - dim(distmat)[1] for ( i in 1:nItems ){ mem[[i]] - c(i) } for ( m in 1:nItems ){ for ( n in 1:nItems ){ if (m != n (distmat[m,n] = radius)){ ##item is within radius, so add to collection m mem[[m]] - sort(c( mem[[m]],n)) } } } return(mem) } That generates the list, like this: [[1]] [1] 1 3 4 5 6 7 8 9 10 [[2]] [1] 2 3 4 10 [[3]] [1] 1 2 3 4 5 6 7 8 10 [[4]] [1] 1 2 3 4 10 [[5]] [1] 1 3 5 6 7 8 9 10 [[6]] [1] 1 3 5 6 7 8 9 10 [[7]] [1] 1 3 5 6 7 8 9 10 [[8]] [1] 1 3 5 6 7 8 9 10 [[9]] [1] 1 5 6 7 8 9 10 [[10]] [1] 1 2 3 4 5 6 7 8 9 10 The next task is to eliminate the redundant elements. unique() does not apply to lists, so I have to scan one by one. cluslist - getUDClusters(distmat,radius) ##find redundant (same) clusters redundantCluster - c() for (m in 1:(length(cluslist)-1)) { for ( n in (m+1): length(cluslist) ){ if ( m != n length(cluslist[[m]]) == length(cluslist[[n]]) ){ if ( sum(cluslist[[m]] == cluslist[[n]]){ redundantCluster - c( redundantCluster,n) } } } } ##make sure they are sorted in reverse order if (length(redundantCluster)0) { redundantCluster - unique(sort(redundantCluster, decreasing=T)) ## remove redundant clusters (must do in reverse order to preserve index of cluslist) for (i in redundantCluster) cluslist[[i]] - NULL } Question: am I deleting the list elements properly? I do not find explicit documentation for R on how to remove elements from lists, but trial and error tells me myList[[5]] - NULL will remove the 5th element and then close up the hole caused by deletion of that element. That suffles the index values, So I have to be careful in dropping elements. I must work from the back of the list to the front. Is there an easier or faster way to remove the redundant clusters? Now, the next question. After eliminating the redundant sets from the list, I need to calculate the total number of items present in the whole list, figure how many are in each subset--each list item--and do some calculations. I expected this would iterate over the members of the list--one step for each subcollection for (i in cluslist){ } but it does not. It iterates over the items within the subsets of the list cluslist. I mean, if cluslist has 5 sets, each with 10 elements, this for loop takes 50 steps, one for each individual item. I find this does what I want for (i in 1:length(cluslist)) But I found out the hard way :) Oh, one more quirk that fooled me. Why does unique() applied to a distance matrix throw away the 0's I think that's really bad! x - rnorm(5) myDist - dist(x,diag=T,upper=T) myDist 1 2 3 4 5 1 0.000 1.2929976 1.6658710 2.6648003 0.5494918 2 1.2929976 0.000 0.3728735 1.3718027 0.7435058 3 1.6658710 0.3728735 0.000 0.9989292 1.1163793 4 2.6648003 1.3718027 0.9989292 0.000 2.1153085 5 0.5494918 0.7435058 1.1163793 2.1153085 0.000 unique(myDist) [1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 1.3718027 0.7435058 [8] 0.9989292 1.1163793 2.1153085 -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] working with pairlists imported from HDF5, converting to data frames?
I've used the HDF5 library to bring some data into R. THe verbose output looks like this: hdf5load(hdfGraphWed_Mar_16_13_33_37_2005.hdf,load=T,verbosity=1,tidy=T) Processing object: cprSeats .. which is a Group Processing object: Seats 0 .. its a dataset..Finished dataset Processing object: Seats 1 .. its a dataset..Finished dataset Processing object: Seats 2 .. its a dataset..Finished dataset Processing object: Seats 3 .. its a dataset..Finished dataset Processing object: Seats 4 .. its a dataset..Finished dataset ... Done group cprSeats Processing object: effective .. which is a Group Processing object: AggComp .. its a dataset..Finished dataset Processing object: AggNonComp .. its a dataset..Finished dataset Processing object: CPR .. its a dataset..Finished dataset Processing object: PR .. its a dataset..Finished dataset Processing object: SMD .. its a dataset..Finished dataset ... Done group effective Each item inside the group effective is a vector of numbers. I want to convert effective into a data frame or matrix for use with matplot. However, R sees effective not as a collection of vectors, but as a pairlist. I'm not a Lisp programmer, so don't understand the significance of the list help page's comment about dotted lists. class(effective) [1] pairlist attributes(effective) $names [1] SMDPR CPRAggNonComp AggComp I can access the elements in effective as list elements, either with effective$SMD or effective[[1]], as in: effective[[1]] [1] 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 [9] 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 [17] 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 [25] 1.00 1.00 1.00 1.00 1.00 1.00 4.761905 4.668534 [33] 4.743833 4.743833 4.694836 4.672897 4.612546 4.612546 4.612546 4.950495 [41] 4.766444 4.761905 4.329004 4.990020 4.930966 4.906771 4.378284 4.686036 [49] 4.935834 4.793864 4.793864 4.541326 4.849661 4.730369 4.960317 4.159734 But I can't force it into a data frame as.data.frame(effective) Error in as.data.frame.default(effective) : can't coerce pairlist into a data.frame But I can manually build the dataframe data.frame(effective$SMD,effective$PR,effective$CPR,effective$AggNonComp,effective$AggComp) But that is a bit frustrating for every individual dataset coming in. Is there a short cut? -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] here's why it make s sense
if you go x[i] you are giving x an index vector, which we had mistakenly thought was an integer. Rather, it is a vector of indices for observations. Here's data x - c(1 , 4, 3, 2, 5) x[1] would be 1 x[2] would bd 4 but if you put an index vector in the brackets, you have x [ c(1,2,1,2] ] it means take the first, the second, the first, the second, so x [ c(1,2,1,2] ] is 1, 4, 1, 4 So in the procedure diff.ttest, when we have x[i] and y[i], we mean take the vectors x and y after grabbing the index values selected for each resample That page I sent you a minute ago is from R by Example, which I think is quite great. http://www.mayin.org/ajayshah/KB/R/index.html pj -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Users in Ukraine cyrillic support
Hello, everybody: My friends in Ukraine are starting a research lab at a national university and they asked what programs to use. I said R of course, and they then asked me 'what support does it have for Cyrillic'? i've done some snooping in the R website and all the references i find to foreign languages concern c and fortran, not Ukrainian or Russian. Since i'm an English-only user, I have no idea what is involved here, or what difficulties might be caused with non-English character sets and file systems. Not to mention the problem that the documentation/manuals are in English. If you have any advice that I can collect up for my friends, I would appreciate it. -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] dotplot lattice problems: y axis values and bg color output in jpg
I have a linux system with Fedora Core 2 and R-2.0. I was comparing plots made with plot() and dotplot() and discovered a problem. Although the dots are positioned correctly, the numerical labels in the dotplot y axis are not correct. I put copies here: http://lark.cc.ku.edu/~pauljohn/R/plotTrouble1.jpg That is the correct one from plot, with the higest value on y showing at 18. http://lark.cc.ku.edu/~pauljohn/R/plotTrouble2.jpg That is the dotplot one. The picture is basically the same, except the numbers on the y axis only go up to 8. But the dots are in the correct spots and the x axis is labeled correctly. On the screen, the plots have white backgrounds, but the picture from the dotplot turns out gray. Why is that? Sometimes, if I run another lattice plot in the same session, the colors change to the dark background, and I suppose the same trouble is happening. Isn't trellis.par.set() going to remain in place for all following trellis plots? Here's the code I used to make these 2 graphs. x11(height=6.5,width=6.5) data2003- subset(elaine1,POLCYEAR==2003) plot(data2003$OLDCRASH,data2003$RENUCYC) modall - lm(RENUCYC~OLDCRASH,data=data2003) abline(modall) dev.copy(device=jpeg,file=plotTrouble1.jpg) dev.off() dev.off() trellis.par.set(theme=col.whitebg(),height=9,width=6.5) dotplot (RENUCYC~OLDCRASH, data=elaine1, xlab=ECR, as.table=T,subset=POLCYEAR==2003, panel=function(x,y) { panel.dotplot(x,y,cex=0.2); panel.lmline(x,y); } ) jpeg(file=plotTrouble2.jpg,bg=white,height=480,width=480) trellis.last.object() dev.off() dev.off() I tried to force the ylim on the dotplot up to 18, but it just produced this even uglier result. http://lark.cc.ku.edu/~pauljohn/R/plotTrouble3.jpg -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Diagnosing trouble with R-2.0, Fedora Core 2, and Rcmdf
Greetings, R-help! On 2 Fedora Core 2 Linux systems, i've completely erased the previous R and all packages and then installed R-2.0 and installed fresh packages. In using Rcmdr, I see some trouble and I wonder if other people see this and if it is due to the tcl/tk, or R, or Rcmdr. (If readers have not yet tried Rcmdr, I recommend it not just because of the GUI it provides, but also because Prof. Fox has created several very handy commands, like scatter3d, which make it possible to use advanced packages like rgl. Rcmdr provides several very handy wrapper commands that just work and users don't have to fuss over so many details! And, if you are new at R, you can use pull down menus and then Rcmdr displays the commands that correspond with those menu options. Very educational!) In no particular order, here are the issues I see with R-2.0 and Rcmdr 1. In the panel to select a dataset from a package (nenu: Data/Data in packages), I can type in the name of a dataset, but the GUI will not let me choose the name of a package from which to select a dataset. Nothing happens when I pick a package name. (With previous R/Rcmdr, I did not have this trouble). 2. When using Rcmdr, some of the output from commands shows in the Rcmdr bottom window, but some is sent to the xterm in which R was started. For example, with a scatter3d() command, if I add the option model.summary=T, the model summary shows in the xterm. But some other models show results in the bottom pane of Rcmdr. And, it seems to me, there are some commands in which I've noticed some of the output going to the Rcmdr bottom buffer and some to the xterm. 3. When I want to exit Rcmdr and choose the pull down exit from Commander and R, I get a Segmentation Fault that closes Rcmdr and R. 4. 3d plots do not display on the screen until I click in the rgl window that displays the graph. This might be a video card/AGP driver problem, I suppose. These systems have the NVidia FX5200 cards and the NVidia proprietary X driver. If other users don't see the same on other kinds of systems, I guess that will tell me where the problem lies. 5. At random times, I get an error tcl/tk window popping up with a message about MASS and categorical variables. It says Error in data(package=parse(text=packageName)); 'MASS' must be character string or NULL It says that over and over, sometimes it has other package names, as in: Error in data(package=parse(text=packageName)); 'relimp' must be character string or NULL These seem to be harmless? -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] polr (MASS) and lrm (Design) differences in tests of statistical signifcance
Greetings: I'm running R-1.9.1 on Fedora Core 2 Linux. I tested a proportional odds logistic regression with MASS's polr and Design's lrm. Parameter estimates between the 2 are consistent, but the standard errors are quite different, and the conclusions from the t and Wald tests are dramatically different. I cranked the abstol argument up quite a bit in the polr method and it did not make the differences go away. So 1. Can you help me see why the std. errors in the polr are so much smaller, and 2. Can I hear more opinions on the question of t vs. Wald in making these signif tests. So far, I understand the t is based on the asymptotic Normality of the estimate of b, and for finite samples b/se is not exactly distributed as a t. But I also had the impression that the Wald value was an approximation as well. summary(polr(as.factor(RENUCYC) ~ DOCS + PCT65PLS*RANNEY2 + OLDCRASH + FISCAL2 + PCTMETRO + ADMLICEN, data=elaine1)) Re-fitting to get Hessian Call: polr(formula = as.factor(RENUCYC) ~ DOCS + PCT65PLS * RANNEY2 + OLDCRASH + FISCAL2 + PCTMETRO + ADMLICEN, data = elaine1) Coefficients: Value Std. Error t value DOCS 0.004942217 0.002952001 1.674192 PCT65PLS 0.454638558 0.113504288 4.005475 RANNEY2 0.110473483 0.010829826 10.200855 OLDCRASH 0.139808663 0.042245692 3.309418 FISCAL2 0.025592117 0.011465812 2.232037 PCTMETRO 0.018184093 0.007792680 2.333484 ADMLICEN -0.028490387 0.011470999 -2.483688 PCT65PLS:RANNEY2 -0.008559228 0.001456543 -5.876400 Intercepts: Value Std. Error t value 2|36.6177 0.301921.9216 3|47.1524 0.277325.7938 4|5 10.5856 0.214949.2691 5|6 12.2132 0.185865.7424 6|8 12.2704 0.185666.1063 8|10 13.0345 0.218459.6707 10|12 13.9801 0.351739.7519 12|18 14.6806 0.558726.2782 Residual Deviance: 587.0995 AIC: 619.0995 lrm(RENUCYC ~ DOCS + PCT65PLS*RANNEY2 + OLDCRASH + FISCAL2 + PCTMETRO + ADMLICEN, data=elaine1) Logistic Regression Model lrm(formula = RENUCYC ~ DOCS + PCT65PLS * RANNEY2 + OLDCRASH + FISCAL2 + PCTMETRO + ADMLICEN, data = elaine1) Frequencies of Responses 2 3 4 5 6 8 10 12 18 21 12 149 46 1 10 6 2 2 Frequencies of Missing Values Due to Each Variable RENUCYC DOCS PCT65PLS RANNEY2 OLDCRASH FISCAL2 PCTMETRO ADMLICEN 50060505 Obs Max Deriv Model L.R. d.f. P C Dxy 249 7e-05 56.58 8 0 0.733 0.465 Gamma Tau-a R2 Brier 0.47 0.278 0.22 0.073 Coef S.E. Wald Z P y=3-6.617857 6.716688 -0.99 0.3245 y=4-7.152561 6.716571 -1.06 0.2869 y=5 -10.585705 6.74 -1.57 0.1164 y=6 -12.213340 6.755656 -1.81 0.0706 y=8 -12.270506 6.755571 -1.82 0.0693 y=10 -13.034584 6.756829 -1.93 0.0537 y=12 -13.980235 6.767724 -2.07 0.0389 y=18 -14.680760 6.786639 -2.16 0.0305 DOCS 0.004942 0.002932 1.69 0.0918 PCT65PLS 0.454653 0.552430 0.82 0.4105 RANNEY2 0.110475 0.076438 1.45 0.1484 OLDCRASH 0.139805 0.042104 3.32 0.0009 FISCAL2 0.025592 0.011374 2.25 0.0245 PCTMETRO 0.018184 0.007823 2.32 0.0201 ADMLICEN-0.028490 0.011576 -2.46 0.0138 PCT65PLS * RANNEY2 -0.008559 0.006417 -1.33 0.1822 -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
followup: Re: [R] Issue with predict() for glm models
I have a follow up question that fits with this thread. Can you force an overlaid plot showing predicted values to follow the scaling of the axes of the plot over which it is laid? Here is an example based on linear regression, just for clarity. I have followed the procedure described below to create predictions and now want to plot the predicted values on top of a small section of the x-y scatterplot. x - rnorm(100, 10, 10) e - rnorm(100, 0, 5) y - 5 + 10 *x + e myReg1 - lm (y~x) plot(x,y) newX - seq(1,10,1) myPred - predict(myReg1,data.frame(x=newX)) Now, if I do this, I get 2 graphs overlaid but their axes do not line up. par(new=T) plot(newX,myPred$fit) The problem is that the second one uses the whole width of the graph space, when I'd rather just have it go from the small subset where its x is defined, from 1 to 10. Its stretching the range (1,10) for newX to use the same scale that goes from (-15, 35) where it plots x I know abline() can do this for lm, but for some other kinds of models, no lines() method is provided, and so I am doing this the old fashioned way. pj John Fox wrote: Dear Uwe, -Original Message- From: Uwe Ligges [mailto:[EMAIL PROTECTED] Sent: Thursday, September 23, 2004 8:06 AM To: John Fox Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [R] Issue with predict() for glm models John Fox wrote: Dear Uwe, Unless I've somehow messed this up, as I mentioned yesterday, what you suggest doesn't seem to work when the predictor is a matrix. Here's a simplified example: X - matrix(rnorm(200), 100, 2) y - (X %*% c(1,2) + rnorm(100)) 0 dat - data.frame(y=y, X=X) mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = matrix(rnorm(20),2)) predict(mod, new) Dear John, the questioner had a 2 column matrix with 40 and one with 50 observations (not a 100 column matrix with 2 observation) and for those matrices it works ... Indeed, and in my example the matrix predictor X has 2 columns and 100 rows; I did screw up the matrix for the new data to be used for predictions (in the example I sent today but not yesterday), but even when this is done right -- where the new data has 10 rows and 2 columns -- there are 100 (not 10) predicted values: X - matrix(rnorm(200), 100, 2) # original predictor matrix with 100 rows y - (X %*% c(1,2) + rnorm(100)) 0 dat - data.frame(y=y, X=X) mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows predict(mod, new) # note 100 predicted values 12345 6 5.75238091 0.31874587 -3.00515893 -3.77282121 -1.97511221 0.54712914 789 10 11 12 1.85091226 4.38465524 -0.41028694 -1.53942869 0.57613555 -1.82761518 . . . 91 92 93 94 95 96 0.36210780 1.71358713 -9.63612775 -4.54257576 -5.29740468 2.64363405 97 98 99 100 -4.45478627 -2.44973209 2.51587537 -4.09584837 Actually, I now see the source of the problem: The data frames dat and new don't contain a matrix named X; rather the matrix is split columnwise: names(dat) [1] y X.1 X.2 names(new) [1] X.1 X.2 Consequently, both glm and predict pick up the X in the global environment (since there is none in dat or new), which accounts for why there are 100 predicted values. Using list() rather than data.frame() produces the originally expected behaviour: new - list(X = matrix(rnorm(20),10, 2)) predict(mod, new) 1 2 3 4 5 6 7 5.9373064 0.3687360 -8.3793045 0.7645584 -2.6773842 2.4130547 0.7387318 8 9 10 -0.4347916 8.4678728 -0.8976054 Regards, John Best, Uwe 12345 6 1.81224443 -5.92955128 1.98718051 -10.05331521 2.6506 -2.50635812 789 10 11 12 5.63728698 -0.94845276 -3.61657377 -1.63141320 5.03417372 1.80400271 13 14 15 16 17 18 9.32876273 -5.32723406 5.29373023 -3.90822713 -10.95065186 4.90038016 . . . 97 98 99 100 -6.92509812 0.59357486 -1.17205723 0.04209578 Note that there are 100 rather than 10 predicted values. But with individuals predictors (rather than a matrix), x1 - X[,1] x2 - X[,2] dat.2 - data.frame(y=y, x1=x1, x2=x2) mod.2 - glm(y ~ x1 + x2, family=binomial, data=dat.2) new.2 - data.frame(x1=rnorm(10), x2=rnorm(10)) predict(mod.2, new.2) 1 2 3 4 5 6 7 6.5723823 0.6356392 4.0291018 -4.7914650 2.1435485 -3.1738096 -2.8261585 8 9 10 -1.5255329 -4.7087592 4.0619290 works as expected (?). Regards, John -Original Message- From: [EMAIL
Re: [R] R glm
No! ?family The 'gaussian' family accepts the links 'identity', 'log' and 'inverse'; Kahra Hannu wrote: In Venables Ripley: Modern Applied Statistics with S (MASS), (4th edition), on page 184 there is a table Families and link functions that gives you the available links with different families. The default and the only link with the gaussian family is identity. ciao, Hannu Kahra -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Shuangge Ma Sent: Thursday, September 23, 2004 6:26 PM To: [EMAIL PROTECTED] Subject: [R] R glm Hello: would you please help me with the following glm question? for the R function glm, what I understand is: once you specify the family, then the link function is fixed. My question is: is it possible I use, for example, log link function, but the estimation approach for the guassian family? Thanks, Shuangge Ma, Ph.D. * CHSCC, Department of Biostatistics * * University of Washington * * Building 29, Suite 310, 6200 NE 74th ST. * * Seattle, WA 98115* * Tel: 206-685-7123 Fax: 206-616-4075 * __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Ever see a stata import problem like this?
Greetings Everybody: I generated a 1.2MB dta file based on the general social survey with Stata8 for linux. The file can be re-opened with Stata, but when I bring it into R, it says all the values are missing for most of the variables. This dataset is called morgen.dta and I dropped a copy online in case you are interested http://www.ku.edu/~pauljohn/R/morgen.dta looks like this to R (I tried various options on the read.dta command): myDat - read.dta(morgen.dta) summary(myDat) CASEID yearid hrs1 hrs2 Min. : 19721 Min. :1972 Min. : 1 NAP :0 NAP :0 1st Qu.: 1983475 1st Qu.:1978 1st Qu.: 445 DK :0 DK :0 Median : 1996808 Median :1987 Median : 905 NA :0 NA :0 Mean : 9963040 Mean :1986 Mean : 990 NA's:40933 NA's:40933 3rd Qu.:19872187 3rd Qu.:1994 3rd Qu.:1358 Max. :20002817 Max. :2000 Max. :3247 prestige agewedage educpaeduc DK,NA,NAP:0 NAP :0 DK :0 NAP :0 NAP :0 NA's :40933 DK :0 NA :0 DK :0 DK :0 NA :0 NA's:40933 NA :0 NA :0 NA's:40933NA's:40933 NA's:40933 maeduc speduc income NAP :0 NAP :0 $25000 OR MORE:14525 DK :0 DK :0 $1 - 14999: 5022 NA :0 NA :0 $15000 - 1: 3869 NA's:40933 NA's:40933 $2 - 24999: 3664 REFUSED : 1877 (Other) : 8523 NA's : 3453 Here's what Stata sees when I load the same thing: summarize, detail Case identification number - Percentiles Smallest 1% 197432 19721 5% 199649 19722 10% 1974116 19723 Obs 40933 25% 1983475 19724 Sum of Wgt. 40933 50% 1996808 Mean9963040 Largest Std. Dev. 9006352 75% 1.99e+07 2.00e+07 90% 2.00e+07 2.00e+07 Variance 8.11e+13 95% 2.00e+07 2.00e+07 Skewness .18931 99% 2.00e+07 2.00e+07 Kurtosis 1.045409 GSS YEAR FOR THIS RESPONDENT - Percentiles Smallest 1% 1972 1972 5% 1973 1972 10% 1974 1972 Obs 40933 25% 1978 1972 Sum of Wgt. 40933 50% 1987 Mean 1986.421 Largest Std. Dev. 8.61136 75% 1994 2000 90% 1998 2000 Variance 74.15552 95% 2000 2000 Skewness -.0789223 99% 2000 2000 Kurtosis 1.799939 RESPONDENT ID NUMBER - Percentiles Smallest 1% 18 1 5% 89 1 10% 178 1 Obs 40933 25% 445 1 Sum of Wgt. 40933 50% 905 Mean 989.9129 Largest Std. Dev. 689.0596 75% 1358 3244 90% 2027 3245 Variance 474803.2 95% 2437 3246 Skewness .8359211 99% 2867 3247 Kurtosis 3.311248 NUMBER OF HOURS WORKED LAST WEEK - Percentiles Smallest 1%6 0 5% 15 0 10% 21 0 Obs 23279 25% 37 0 Sum of Wgt. 23279 50% 40 Mean 41.05206 Largest Std. Dev. 13.95931 75% 48 89 90% 60 89 Variance 194.8624 95% 65 89 Skewness.195045 99% 82 89 Kurtosis 4.448998 NUMBER OF HOURS USUALLY WORK A WEEK - Percentiles Smallest 1%4 0 5% 15 0 10% 20 1 Obs 774 25% 38 2 Sum of Wgt. 774 50% 40 Mean 39.79199 Largest Std. Dev. 13.43383 75% 45 89 90% 55 89 Variance 180.4677 95% 60 89
[R] R-release.diff + R-1.9 - success on Fedora Core 2, R RPM available; ess-emacs-5.1.20 also available
Dear Everybody: I have Fedora Core 2 and R-1.9.0 does not build out of the box. After applying the daily patch file R-release.diff, I find it does build and I've made RPMS and SRPM. In case you want to save yourself a recompile, you can find Fedora Core RPMs in here: http://lark.cc.ku.edu/~pauljohn/software/R These are based on the standard R distribution SPEC file, only the patch was added. Also, I have created an RPM for ESS for Emacs on Fedora Core 2 (not Xemacs) and you can find that here: http://lark.cc.ku.edu/~pauljohn/software/favoriteEmacsFiles -- While I'm on the RPM subject, can I ask an R RPM packaging question? I want the R RPM to install so that post install then R starts and runs update.packages() as well as install.packages(c(Design,Hmisc,lmtest,car,rgl,effects)) well, you get the idea. I want to add in more packages, of course. Can I ask what might be the best way to package this? pj ps: Here's the error that results if you do not patch R-1.9.0 on FC2: gcc -I. -I../../../src/include -I../../../src/include -I/usr/X11R6/include -I/usr/local/incl ude -DHAVE_CONFIG_H -D__NO_MATH_INLINES -mieee-fp -fPIC -O2 -g -pipe -march=i386 -mcpu=i686 -c dataentry.c -o dataentry.lo In file included from dataentry.c:31: /usr/X11R6/include/X11/Xlib.h:1400: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:1488: error: syntax error before char /usr/X11R6/include/X11/Xlib.h:1516: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:1520: error: syntax error before char /usr/X11R6/include/X11/Xlib.h:1542: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:1577: error: syntax error before '*' token /usr/X11R6/include/X11/Xlib.h:1586: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:1611: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:1661: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:1667: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:1714: error: syntax error before char /usr/X11R6/include/X11/Xlib.h:1753: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:1994: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:2078: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:2331: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:2341: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:2413: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:2423: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:2581: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:2596: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:2789: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:2856: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:2861: error: syntax error before char /usr/X11R6/include/X11/Xlib.h:2975: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3001: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3012: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3037: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3046: error: syntax error before char /usr/X11R6/include/X11/Xlib.h:3059: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3202: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3251: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3283: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3374: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3381: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3401: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3407: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3419: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3429: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3770: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3781: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3792: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3803: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3814: error: syntax error before _Xconst /usr/X11R6/include/X11/Xlib.h:3825: error: syntax error before _Xconst In file included from dataentry.c:32: /usr/X11R6/include/X11/Xutil.h:566: error: syntax error before _Xconst /usr/X11R6/include/X11/Xutil.h:606: error: syntax error before _Xconst /usr/X11R6/include/X11/Xutil.h:666: error: syntax error before _Xconst /usr/X11R6/include/X11/Xutil.h:678: error: syntax error before _Xconst /usr/X11R6/include/X11/Xutil.h:801: error: syntax error before _Xconst dataentry.c: In function `GetKey': dataentry.c:1272: warning: passing arg 4 of `XLookupString' from incompatible pointer type dataentry.c: In function `GetCharP': dataentry.c:1281: warning: passing arg 4 of
Re: [R] Explaining Survival difference between Stata and R
Dear Goran (and others) I did not know about the eha package, but reading the docs, I see many things I've been looking for, including the parametric hazard model with Weibull baseline. Thanks for the tip, and the package. I still don't quite understand your point about the reason that coxph crashes. Why does the difference of scale between the variables cause trouble? I understand that a redundant set of variables would cause singularity, but do not see why differing scales for variables would cause that. Does the explanation lie in the optimization algorithm itself? I'll read the eha package coxreg source code. I bet that will show me what's going on in your fix, anyway. Still, I wish coxph just handled all that stuff. Just for fun, here are the Stata results for the same small model that you report. If I ask stata to use the efron method of breaking ties, then the results do almost exactly match what you found. stset yrs2, failure (ratify) stcox haz_wst pol_free , robust nohr efron Iteration 0: log pseudo-likelihood = -45.380139 Iteration 1: log pseudo-likelihood = -45.00981 Iteration 2: log pseudo-likelihood = -45.001196 Iteration 3: log pseudo-likelihood = -45.001193 Refining estimates: Iteration 0: log pseudo-likelihood = -45.001193 Cox regression -- Efron method for ties No. of subjects = 21 Number of obs = 21 No. of failures = 21 Time at risk = 78 Wald chi2(2)= 1.09 Log pseudo-likelihood = -45.001193 Prob chi2 = 0.5785 -- | Robust _t | Coef. Std. Err. zP|z| [95% Conf. Interval] -+ haz_wst | 8.48e-08 8.63e-08 0.98 0.326-8.43e-08 2.54e-07 pol_free | .0089626 .1047656 0.09 0.932-.1963741 .2142994 -- Göran Broström wrote: Paul and Thomas, 'coxph' or the data (I got it from Paul) indeed has a problem: Call: coxph(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat) coef exp(coef) se(coef) zp haz.wst 8.53e-08 1 9.47e-08 0.901 0.37 pol.free NANA 0.00e+00NA NA Likelihood ratio test=0.76 on 1 df, p=0.385 n= 21 Warning message: X matrix deemed to be singular; variable 2 in: coxph(Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat) --- Is it the data? Let's try 'coxreg' (eha): --- Call: coxreg(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat) Covariate Mean CoefRR Wald p haz.wst 2054901 0.000 1.0000.372 pol.free2.090 0.009 1.0090.958 Events21 Total time at risk78 Max. log. likelihood -45.001 LR test statistic 0.76 Degrees of freedom2 Overall p-value 0.684583 This worked just fine (Paul, same results as in Stata?). But, we seem to have a scaling problem; lok at the means of the covariates! Some rescaling gives: Call: coxph(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free, data = dat) coef exp(coef) se(coef) zp I(haz.wst * 1e-06) 0.08479 1.090.095 0.8920 0.37 pol.free 0.00896 1.010.170 0.0526 0.96 Likelihood ratio test=0.76 on 2 df, p=0.685 n= 21 and now 'coxph' gets the same results as 'coxreg'. I don't know about coxph for sure, but I do know that coxreg centers all covariates before the NR procedure starts. Maybe we also should rescale to unit variance? And of course scale back the coefficients and se:s at the end? BTW, Paul's data is heavily tied. Could be an idea to use a discrete time version of the PH model: you can do that with 'mlreg' (eha): --- Call: mlreg(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free, data = dat) Covariate Mean CoefRR Wald p I(haz.wst * 1e-06) 2.055 0.097 1.102 0.320 pol.free2.090 0.003 1.003 0.985 Events21 Total time at risk78 Max. log. likelihood -36.324 LR test statistic 0.93 Degrees of freedom2 Overall p-value 0.627056
[R] Explaining Survival difference between Stata and R
Dear Everybody: I'm doing my usual how does that work in R thing with some Stata projects. I find a gross gap between the Stata and R in Cox PH models, and I hope you can give me some pointers about what goes wrong. I'm getting signals from R/Survival that the model just can't be estimated, but Stata spits out numbers just fine. I wonder if I should specify initial values for coxph? I got a dataset from a student who uses Stata and try to replicate in R. I will share data to you in case you want to see for yourself. Let me know if you want text or Stata data file. In R, I try this: cox2 - coxph(Surv(yrs2,ratify)~ accession+ haz.wst+ haz.in +haz.out+ wefgov+ rle+ rqe + pol.free +tai.2001 + ny.gdp.pcap.pp.cd + eio, data=dat3, control=coxph.control(iter.max=1000),singular.ok=T) Warning message: Ran out of iterations and did not converge in: fitter(X, Y, strats, offset, init, control, weights = weights, So I wrote out the file exatly as it was in R into Stata dataset write.dta(dat3,cleanBasel.dta) Warning message: Abbreviating variable names in: write.dta(dat3, cleanBasel.dta) Here's the Stata output: . use /home/pauljohn/ps/ps909/AdvancedRegression/duration_2/cleanBasel.dta (Written by R. ) . stset yrs2, failure (ratify) failure event: ratify != 0 ratify . obs. time interval: (0, yrs2] exit on or before: failure -- 21 total obs. 0 exclusions -- 21 obs. remaining, representing 21 failures in single record/single failure data 78 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 . stcox accessin haz_wst haz_in haz_out wefgov rle rqe pol_free tai_2001 ny_gd eio, robust nohr failure _d: ratify analysis time _t: yrs2 Iteration 0: log pseudo-likelihood = -49.054959 Iteration 1: log pseudo-likelihood = -45.021682 Iteration 2: log pseudo-likelihood = -44.525187 Iteration 3: log pseudo-likelihood = -44.521588 Iteration 4: log pseudo-likelihood = -44.521586 Refining estimates: Iteration 0: log pseudo-likelihood = -44.521586 Cox regression -- Breslow method for ties No. of subjects = 21 Number of obs = 21 No. of failures = 21 Time at risk = 78 Wald chi2(11) = 81.64 Log pseudo-likelihood = -44.521586 Prob chi2 = 0. -- | Robust _t | Coef. Std. Err. zP|z| -+ accessin | -1.114101 .6343663-1.76 0.079 haz_wst | 2.32e-08 1.08e-07 0.22 0.829 haz_in | 3.78e-06 2.46e-06 1.54 0.124 haz_out | -3.80e-07 3.76e-07-1.01 0.312 wefgov | 2.139127 .9136992 2.34 0.019 rle | 1.827482 1.500878 1.22 0.223 rqe | -3.126696 1.332069-2.35 0.019 pol_free | -.4498276.291764-1.54 0.123 tai_2001 | -2.895922 2.577401-1.12 0.261 ny_gd___ | -.0003223 .0002194-1.47 0.142 eio | -.053 .0726064-0.80 0.426 -- . last observed exit t = 7 -- Paul Johnson Dept. of Political Science University of Kansas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] One inflated Poisson or Negative Binomal regression
Dear Peter: I notice there is a R code for a Zero-inflated Poisson/NB process on the Stanford Political Science Computational Lab (Prof. Simon Jackman) web page. If I were wanting to do a one-inflated model, I would start with that because, at least to my eye, it is very easy to follow. Mind you, I did not try this myself, but I bet you could make it go. In the file zeroinfl.r, look at the function: zeroinflNegBin - function(parms){ it is pretty clear you'd have to supply a probability model for the outcomes valued 1 and then fit them into the overall likelihood. pj http://pscl.stanford.edu/content.html Peter Flom wrote: Hello I am interested in Poisson or (ideally) Negative Binomial regression with an inflated number of 1 responses I have seen JK Lindsey's fmr function in the gnlm library, which fits zero inflated Poisson (ZIP) or zero inflated negative binomial regression, but the help file states that for ' Poisson or related distributions the mixture involves the zero category'. I had thought of perhaps subtracting 1 from all the counts and then fitting the ZIP or ZINB models, and then adding 1, but am not sure if this is legitimate, or if there is some better method. Contextual details: The dependent variable is number of primary sexual partners in the last year. The independent variables include a) Being married or in a committed relationship b) using hard drugs c) sex d) age N is c. 500 Not surprisingly, there are a large number of 1 responses, especially for those who are married or in a relationship. More surprisingly, the mean number of partners is the same (1.05 vs. 1.02) for people in and not in relationships, but the variances are very different, mostly because those in a relationhsip are much more likely to say exactly 1. Thanks in advance Peter Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] solved mystery of difference between glmmPQL and lme
I asked a few days ago about the difference in results I saw between the MASS function glmmPQL (due to Venables and Ripley) and the lme function from the package nlme (due to Pinheiro and Bates). When the two tools apply to the same model (gaussian, link=identity, correlation=AR1), I was getting different results and wondered if there was an argument in favor of one or the other. Several list readers emailed me to point out that glmmPQL is repeatedly calling lme, so if a model really can be estimated by lme, then lme is the more appropriate one because it is maximum likelihood, rather than quasi-likelihood. That did not explain the difference in results, so I read the source code for glmmPQL and learned that it sets the method for lme fitting to ML. In contrast, lme defaults to REML. The estimates from glmmPQL and lme (method=ML) are identical in my test case. pj -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] contrast lme and glmmPQL and getting additional results...
I have a longitudinal data analysis project. There are 10 observations on each of 15 units, and I'm estimating this with randomly varying intercepts along with an AR1 correction for the error terms within units. There is no correlation across units. Blundering around in R for a long time, I found that for linear/gaussian models, I can use either the MASS method glmmPQL (thanks to Venables and Ripley) or the lme from nlme (thanks to Pinheiro and Bates). (I also find that the package lme4 has GLMM, but I can't get the correlation structure to work with that, so I gave up on that one.) The glmmPQL and lme results are quite similar, but not identical. Here are my questions. 1. I believe that both of these offer consistent estimates. Does one have preferrable small sample properties? Is the lme the preferred method in this case because it is more narrowly designed to this gaussian family model with an identity link? If there's an argument in favor of PQL, I'd like to know it, because a couple of the Hypothesis tests based on t-statistics are affected. 2. Is there a pre-made method for calculation of the robust standard errors? I notice that model.matrix() command does not work for either lme or glmmPQL results, and so I start to wonder how people calculate sandwich estimators of the standard errors. 3. Are the AIC (or BIC) statistics comparable across models? Can one argue in favor of the glmmPQL results (with, say, a log link) if the AIC is more favorable than the AIC from an lme fit? In JK Lindsey's Models for Repeated Measurements, the AIC is sometimes used to make model selections, but I don't know where the limits of this application might lie. -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] using unstack inside my function: that old scope problem again
I've been reading the R mail archives and I've found a lot of messages with this same kind of problem, but I can't understand the answers. Can one of you try to explain this to me? Here's my example. Given a regression model and a variable, I want to use unstack() on the vector of residuals and make some magic with the result. But unstack hates me. PCSE - function (tmodel,groupVar) { myres1 - resid(tmodel) resUnstacked - unstack(myres1, form = myres1 ~ groupVar)); E - as.matrix(resUnstacked) SIGMA - (1/nrow(E))*(t(E) %*% E) OMEGA - diag(x=1, nrow=nrow(E), ncol=nrow(E)) %x% SIGMA X - model.matrix(tmodel) XPRIMEXINV - solve(t(X) %*% X) PCSECOVB - XPRIMEXINV %*% (t(X) %*% OMEGA %*% X ) %*% XPRIMEXINV } The error is: PCSE(eld.ols1,dat2$STATEID) Error in eval(expr, envir, enclos) : Object groupVar not found Here's what I don't understand the most. If I hack this so that the resUnstacked is created by a matrix command, then the function works. Why would matrix() work when unstack does not. And why does model.matrix() work if unstack does not. Thanks in advance, as usual. -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] glm questions --- saturated model
I'm confused going back and forth between the textbooks and these emails. Please pardon me that I seem so pedantic. I am pretty certain that -2lnL(saturated) is not 0 by definition. In the binomial model with groups of size=1, then the observed scores will be {0,1} but the predicted mean will be some number in [0,1], and so -2lnL will not be 0. I'm reading, for example, Annette Dobson, An Introduction to Generalized Linear Models, 2ed (2002 p. 77), where it gives the formula one can use for -2lnL(saturated) in the binomial model. For the Normal distribution, Dobson says -2lnL(saturated) = N log(2 pi sigma^2) She gives the saturated model -2lnL(saturated) for lots of distributions, actually. I thought the point in the first note from Prof. Firth was that the deviance is defined up to an additive constant because you can add or subtract from lnL in the deviance formula D = -2[lnL(full) - lnL(subset)] and the deviance is unaffected. But I don't think that means there is a completely free quantity in lnL(saturated). I agree that the deviance of the saturated model is 0 by definition, if by that one means to say -2[lnL(saturated)-lnL(saturated)] but of course, that's just a tautology. Respectfully yours, pj Peter Dalgaard wrote: BXC (Bendix Carstensen) [EMAIL PROTECTED] writes: It's important to remember that lnL is defined only up to an additive constant. For example a Poisson model has lnL contributions -mu + y*log(mu) + constant, and the constant is arbitrary. The differencing in the deviance calculation eliminates it. What constant would you like to use?? I have always been und the impression that the constant chosen by glm is that which makes the deviance of the saturated model 0, the saturated model being the one with one parameter per observation in the dataset. As David pointed out, the deviance of a saturated model is zero by definition. However, there's nothing arbitrary about the constant in a likelihood either since it is supposed to be a density if seen as a function of y (well, if you *really* want to quibble, it's a density with respect to an arbitrary measure, so you could get an arbitrary constant in if you insist, I suppose). The point is that the constant is *uniformative* since it depends on y only, not mu, and hence that people tend to throw some bits of the likelihood away, and not always the same bits. -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] glm questions
Greetings, everybody. Can I ask some glm questions? 1. How do you find out -2*lnL(saturated model)? In the output from glm, I find: Null deviance: which I think is -2[lnL(null) - lnL(saturated)] Residual deviance: -2[lnL(fitted) - lnL(saturated)] The Null model is the one that includes the constant only (plus offset if specified). Right? I can use the Null and Residual deviance to calculate the usual model Chi-squared statistic -2[lnL(null) - lnL(fitted)]. But, just for curiosity's sake, what't the saturated model's -2lnL ? 2. Why no 'scaled deviance' in output? Or, how are you supposed to tell if there is over-dispersion? I just checked andSAS gives us the scaled and nonscaled deviance. I have read the Venables Ripley (MASS 4ed) chapter on GLM . I believe I understand the cautionary point about overdispersion toward the end (p. 408). Since I'm comparing lots of other books at the moment, I believe I see people using the practice that is being criticized. The Pearson Chi-square based estimate of dispersion is recommended and one uses an F test to decide if the fitted model is significantly worse than the saturated model. But don't we still assess over-dispersion by looking at the scaled deviance (after it is calculated properly)? Can I make a guess why glm does not report scaled deviance? Are the glm authors trying to discourage us from making the lazy assessment in which one concludes over-dispersion is present if the scaled deviance exceeds the degrees of freedom? 3. When I run example(glm) at the end there's a Gamma model and the printout says: (Dispersion parameter for Gamma family taken to be 0.001813340) I don't find an estimate for the Gamma distribution's shape paremeter in the output. I'm uncertain what the reported dispersion parameter refers to. Its the denominator (phi) in the exponential family formula, isn't it? y*theta - c(theta) exp [ -- h(y,phi) ] phi 4. For GLM teaching purposes, can anybody point me at some good examples of GLM that do not use Normal, Poisson, Negative Binomial, and/or Logistic Regression? I want to justify the effort to understand the GLM as a framework. We have already in previous semesters followed the usual econometric approach in which OLS, Poisson/Count, and Logistic regression are treated as special cases. Some of the students don't see any benefit from tackling the GLM's new notation/terminology. What I'm lacking is some persuasive evidence that the effort to master the details of the GLM is worthwhile. I could really use some data and reference articles that have applications of Gamma distributed (or exponential) variables, say, or Weibull, or whatever. I've been dropping my course notes in this directory: http://lark.cc.ku.edu/~pauljohn/ps909/AdvancedRegression. The documents GLM1 and GLM2 are pretty good theoretical surveys patting self on back/. But I need to work harder to justify the effort by providing examples. I'd appreciate any feedback, if you have any. And, of course, if you want to take these documents and use them for your own purposes, be my guest. 4. Is it possible to find all methods that an object inherits? I found out by reading the source code for J Fox's car package that model.matrix() returns the X matrix of coded input variables, so one can do fun things like calculate robust standard errors and such. That's really useful, because before I found that, I was recoding up a storm to re-create the X matrix used in a model. Is there a direct way to find a list of all the other methods that would apply to an object? -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700s __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R: Including R plots in a Microsoft Word document
I have wrestled with this problem a lot. I use Linux, coauthors use Windows, and the eps files I make from R don't work with MS Word. Well, the don't ever have previews and they sometimes won't print at all when I use CrossOver Office with MS Office 2000 in Linux. My coauthor says he can often wrestle my eps files into word on his system with Office 2003. People keep telling me to use gsview to insert the preview panes into eps files, and that does work, but more than one half of the time my system creates eps files that look significantly worse than the originals. Sometimes it inserts a blank page at the top of the eps or it reshapes a figure. I don't care enough about MS to try to track that down. It just pisses me off. As a result, I think the answer is more complicated than other people make it seem. I don't think it does any good to output a pdf file because, as I learned yesterday, MS Word users can't import a pdf file into a doc. Clearly, if you are an MS windows user of R, you can save graphics in the windows meta format (wmf) (or is it enhanced meta format, emf?). That will go more or less seamlessly into Word. If you have a chance to boot into Windows, and you really must make an image that works well with Word, then you should boot into Windows, run your R in there and make the wmf file. If you are a Linux/Unix user, and you are too proud to use Windows, the problem is much more difficult to deal with. If you are ABSOLUTELY SURE that your image does not need to be resized in any way, you could output from R into a picture type format, such as png. As long as the image does not need to resized in any way, that will be fine. If it is resized, then all bets are off. I find that the R output to the xfig format is quite good and I can edit files in xfig. You can edit those files, add text, so its very very handy. So right now I'm looking for a good bridge from xfig format to Word. But I just started investigating that. [EMAIL PROTECTED] wrote: On Fri, Feb 20, 2004 at 05:54:33PM +0200, Mahmoud K. Okasha wrote: Greetings List, I am conducting some large simulations using R. As a result, I get many plots but I'm having some trouble with including some of them in a Microsoft Word document. Can any one tell me the easiest method of having copies of the R-graphs in the Word documents? R can produce at least PostScript, PDF, png, jpeg/jpg see: help(postscript) help(pdf) help(png) help(jpeg) I don't use word, for me the PostScript format (more precisely Encapsulated PostScript/.eps) is the best/more easy/powerful format if you don't have thousands of points or lines :-) por instance, to print a simple plot: postscript(file=somefile.eps); plot(whatever); dev.off(); Important other formats are similar regards Ulisses Debian GNU/Linux: a dream come true - Computers are useless. They can only give answers.Pablo Picasso Humans are slow, innaccurate, and brilliant. Computers are fast, acurrate, and dumb. Together they are unbeatable --- Visita http://www.valux.org/ para saber acerca de la--- --- Asociación Valenciana de Usuarios de Linux --- __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] diagnostic information in glm. How about N of missing observations?
I handed out some results from glm() and the students ask how many observations were dropped due to missing values? How would I know? In other stat programs, the results will typically include N and the number dropped because of missings. Without going back to R and fiddling about to find the total number of rows in the dataframe, there is no way to tell. Somewhat inconvenient. Do you agree? -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Basic question on function identical
I hope I am not telling you things you already know. If so, I apologize in advance. There are several C-library addons available to try to deal with the problem that comparisons of floating point numbers can be unpredictable. I think your example with the greater than sign would not be a source of the trouble, but if you ended it with a comparison like if (xmax == 10.7) then you would be in trouble because the internal representation of the float xmax might not be precisely equal to 10.7. Until I hear otherwise, I am thinking that ordinal comparisons like (x xmax) are accurate, but that equality comparisons like (x==xmax) are not. Here's one of the C library projects dealing with the subject: http://fcmp.sourceforge.net/ The author of that library, Ted Belding, did this paper, Numerical Replication of Computer Simulations: Some Pitfalls..., which is informative (IMHO): http://alife.ccp14.ac.uk/ftp-mirror/alife/zooland/pub/research/ci/EC/GA/papers/gecco2000.ps.gz Also check this web page, which I bookmarked: http://vision.eng.shu.ac.uk/C++/c/c-faq/cfaq14.html#r14.6 I know I've seen more, but can't remember where. Ted Harding wrote: For instance, in C-speak, to find the maximum of an array of double x[], of length n, something like the following code could be written: xmax=x[1]; for(i=1;in;i++) if(x[i+1]x[i]) xmax=x[i+1]; Regardless of the accuracy of the comparison, each assignment xmax = ... should make xmax an internally exact copy of the thing on the righthand side. However, your reply suggests that this may not happen, as a result of unpredictable use of 10-byte wide floating point registers on Intel chips. -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] packaging standards for rda files?
Dear everybody: We used the fine foreign library to bring in an SPSS dataset that was about 9 megabytes and I can squeeze it into a much smaller R object using compression with save(ndat, file=NatAnnES2000.rda, compress=T). I can use load() to get the ndat dataframe back, that's all good as far as I can see. If I put that file in the data subdirectory, then the data() command finds it and I can load it. Seems fine, but then I started wondering if there is not some more sophisticated way of packaging these things. For example, how do people put in the meta information that appears in the right side of the data() output, as in: Data sets in package '.': NatAnnES2000 Data sets in package 'base': FormaldehydeDetermination of Formaldehyde HairEyeColorHair and Eye Color of Statistics Students ... Are there other attributes that I should specify if I want to package an .rda file for other users? An rda file created in this way will translate across platforms, won't it? pj -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] read SAS format file from R
Gabor Grothendieck wrote: kan_liu2 wrote: Can you please piont me how to read SAS format file from R (is it possible?)? There was a thread on this last month. Check out the replies to: http://maths.newcastle.edu.au/~rking/R/help/03b/5450.html __ [EMAIL PROTECTED] mailing listhttps://www.stat.math.ethz.ch/mailman/listinfo/r-help I just worked on this. Care to check my example? http://lark.cc.ku.edu/~pauljohn/ANES/2002/ This shows how we took a big american national election study dataset in either SPSS or SAS format and got over to R. -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help