[R] mgcv::gam() scale parameter estimates for quasibinomial error models
For both glm() and mgcv::gam() quasibinomial error models, the summary object has a dispersion value that has the same role as sigma^2 in the summary object for lm() model fits. Where some fitted probabilities are small, the `gam()` default scale parameter estimates, returned as `scale` (and `sig2`) in the gam object and as “dispersion" in the summary object, can differ wildly from the Pearson values that `glm()` works with, and that can be obtained by calling `gam()` with a suitable control setting (see the code that now follows.) The following demonstrates the issue: ## ‘mgcv’ version 1.8-34 Cholera <- HistData:: Cholera library(mgcv) form <- cbind(cholera_deaths, popn-cholera_deaths) ~ water + elevation + poor_rate default.gam <- gam(form, data=Cholera, family=quasibinomial) pearson.gam <- update(quasibin.gam, control=gam.control(scale.est=“pearson")) c(Pearson=pearson.gam$scale, Default=default.gam$scale) ## Pearson Default ## 33.545829 2.919535 My own calculation (from either fitted model), returns 30.07 for the (Fletcher 2012) version of the dispersion that was, according to Wood’s “Generalized Additive Models” (2nd edn, 2017, p.111), returned as the GAM scale estimate at the time when the book was written. The default scale estimates returned by `gam()` vary wildly, relative to the relatively stable “pearson" estimates, in data that are simulated to have comparable dispersion estimates. For the Cholera data, it would make good sense to fit a model with quasipoisson errors to the death counts, using log(popn) as an offset. The GAM model then uses, with default setting for `scale.est`, a scale parameter that is close to that returned by "pearson.gam”. SEs (as well as coefficients) are similar to those returned by "pearson.gam”. The detailed calculations that are documented in the following documents may be of some general interest. https://www.dropbox.com/s/vl9usat07urbgel/quasibin-gam.pdf?dl=0 or https://www.dropbox.com/s/s83mh1mut5xc3gk/quasibin-gam.html?dl=0 I am posting this here now before posting a bug report in case I have missed something important. It would be useful to be directed to the mgcv code used for calculation of what is returned as the Fletcher statistic. Rmd file: https://www.dropbox.com/s/ghmsdcvgxp068bs/quasibin-gam.Rmd?dl=0 .bib file: https://www.dropbox.com/s/r1yjqx0sni2pzjy/quasi.bib?dl=0 John Maindonald email: john.maindon...@anu.edu.au __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 217, Issue 20
No, it is not distribution free. Independent random sampling is assumed. That is a non-trivial assumption, and one that is very often not true or not strictly true. John Maindonald email: john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au> On 21/03/2021, at 00:00, r-help-requ...@r-project.org<mailto:r-help-requ...@r-project.org> wrote: From: Jiefei Wang mailto:szwj...@gmail.com>> Subject: Re: [R] about a p-value < 2.2e-16 Date: 20 March 2021 at 04:41:33 NZDT To: Spencer Graves mailto:spencer.gra...@effectivedefense.org>> Cc: Bogdan Tanasa mailto:tan...@gmail.com>>, Vivek Das mailto:vd4mm...@gmail.com>>, r-help mailto:r-help@r-project.org>> Hi Spencer, Thanks for your test results, I do not know the answer as I haven't used wilcox.test for many years. I do not know if it is possible to compute the exact distribution of the Wilcoxon rank sum statistic, but I think it is very likely, as the document of `Wilcoxon` says: This distribution is obtained as follows. Let x and y be two random, independent samples of size m and n. Then the Wilcoxon rank sum statistic is the number of all pairs (x[i], y[j]) for which y[j] is not greater than x[i]. This statistic takes values between 0 and m * n, and its mean and variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively. As a nice feature of the non-parametric statistic, it is usually distribution-free so you can pick any distribution you like to compute the same statistic. I wonder if this is the case, but I might be wrong. Cheers, Jiefei [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regarding R licensing usage guidance
I think this unfair, certainly the ‘odious’ comment. Very many of us have been able to contribute because supported to do such work while occupying relatively comfortable academic positions. In the process of getting there, we have benefited enormously from the work of those who have gone before. I do not know what Anamika’s position is, but he may well be trying to build a business from a much less privileged starting point than others of us may have enjoyed. In any case, the comment strikes a tone that is out of place in respectful discourse. John Maindonald email: john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au> On 23/07/2019, at 22:00, r-help-requ...@r-project.org<mailto:r-help-requ...@r-project.org> wrote: From: Jeff Newmiller mailto:jdnew...@dcn.davis.ca.us>> Subject: Re: [R] Regarding R licensing usage guidance Date: 23 July 2019 at 07:00:26 NZST To: mailto:r-help@r-project.org>>, ANAMIKA KUMARI mailto:anamika1...@gmail.com>> Your internet skills are pathetic. Search Google for "proprietary use gpl" and the first hit is https://opensource.stackexchange.com/questions/7078/is-it-legal-to-use-gpl-code-in-a-proprietary-closed-source-program-by-putting-i Note that there are (at least) three obvious alternatives if there is any question in your case: release the code under GPL but also sell it with support (a la RStudio); only use it yourself (don't distribute it at all); or only use R for setting up your models but re-engineer implementations of the run-time prediction calculations yourself (often much easier than the initial algorithm development). I think your desperation to steal the hard work of the various R contributors seems quite odious. On July 22, 2019 6:55:31 AM PDT, ANAMIKA KUMARI mailto:anamika1...@gmail.com>> wrote: Hello Team, This mail is in reference to understanding R license and also usage of R language to develop commercialised product. I am working on one product of mine that uses R and python language.I am trying to understand the licensing issue if any related to R, if I am going to commercialise my product and keep my work proprietary. I need your help to understand it. R comes under GNU-GPL-2.0. Now, do I need to share source code of my product. if, I am moving planning to move it to production or I can keep my code Proprietary. Please note that I am just using R and its packages to develop my own statistical tool and api and Have not done any changes to existing R code. Please refer below for all R-packages used in my code:- 1. *R-3.4.4 * 2. *'spacyr'* 3. *'jsonlite' * 4. *'lubridate' * 5. *'data.table' * 6. *'png' * 7. *'maps' * 8. *'countrycode' * 9. *'humaniformat' * 10. *'ngram' * 11. *'stringr' * 12. *'slam' * 13. *'tm' * 14. *'lsa' * 15. *'RTextTools' * 16. *'stringi' * 17. *'plumber' * 18. *"Rook"* 19. *"pdftools"* 20. *'tokenizers'* 21. *'zoo'* 22. *"tidyr"* 23. *"reqres"* 24. *"rJava"* 25. *"tiff"* 26. *"splitstackshape"* 27. *"stringdist"* 28. *"RJSONIO"* 29. *"ropensci/tabulizer"* 30. *"staplr"* 31. *"SparseM"* 32. *"randomForest"* 33. *"e1071"* 34. *"ipred"* 35. *"caTools"* 36. *RCMD INSTALL maxent_1.3.3.1.tar.gz* 37. *RCMD INSTALL tree_1.0-39.tar.gz* 38. *RCMD INSTALL RTextTools_1.4.2.tar.gz* *Any help from you will be highly appreciated as I am literally stuck at a dead end.* *Regards* *Anamika Kumari* [[alternative HTML version deleted]] __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] p values from GLM
How small does a p-value need to be to warrant attention, however? Witness Fisher’s comment that: “. . . we may, if we prefer it, draw the line at one in fifty (the 2 per cent point), or one in a hundred (the 1 per cent point). Personally, the writer prefers to set a low standard of significance at the 5 per cent point, and ignore entirely all results which fail to reach this level. A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.” [Fisher RA (1926), “The Arrangement of Field Experiments,” Journal of the Ministry of Agriculture of Great Britain, 33, 503-513.] See the selection of Fisher quotes at http://www.jerrydallal.com/lhsp/p05.htm . In contexts where a p <= 0.05 becomes more likely under the NULL (not the case if the experiment might just as well have been a random number generator), small P-values shift the weight of evidence. An alternative that is apriori highly unlikely takes a lot of shifting. John Maindonald email: john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au> On 3/04/2016, at 22:00, r-help-requ...@r-project.org<mailto:r-help-requ...@r-project.org> wrote: From: Heinz Tuechler mailto:tuech...@gmx.at>> Subject: Re: [R] p values from GLM Date: 3 April 2016 11:00:50 NZST To: Bert Gunter mailto:bgunter.4...@gmail.com>>, Duncan Murdoch mailto:murdoch.dun...@gmail.com>> Cc: r-help mailto:R-help@r-project.org>> Bert Gunter wrote on 01.04.2016 23:46: ... of course, whether one **should** get them is questionable... http://www.nature.com/news/statisticians-issue-warning-over-misuse-of-p-values-1.19503#/ref-link-1 This paper repeats the common place statement that a small p-value does not necessarily indicate an important finding. Agreed, but maybe I overlooked examples of important findings with large p-values. If there are some, I would be happy to get to know some of them. Otherwise a small p-value is no guarantee of importance, but a prerequisite. best regards, Heinz Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Apr 1, 2016 at 3:26 PM, Duncan Murdoch mailto:murdoch.dun...@gmail.com>> wrote: On 01/04/2016 6:14 PM, John Sorkin wrote: How can I get the p values from a glm ? I want to get the p values so I can add them to a custom report fitwean<- glm(data[,"JWean"]~data[,"Group"],data=data,family=binomial(link ="logit")) summary(fitwean) # This lists the coefficeints, SEs, z and p values, but I can't isolate the pvalues. names(summary(fitwean)) # I see the coefficients, but not the p values names(fitmens) # p values are not found here. Doesn't summary(fitwean) give a matrix? Then it's colnames(summary(fitwean)$coefficients) you want, not names(fitwean). Duncan Murdoch P.S. If you had given a reproducible example, I'd try it myself. Thank you! John John David Sorkin M.D., Ph.D. Professor of Medicine Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<http:/
Re: [R] SWEAVE - a gentle introduction
What’s more, for pdf output one can use R Markdown and judiciously sneak in html and/or LaTeX (consider however what the processing steps might do to such markup). John Maindonald email: john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au> On 19/11/2015, at 00:00, r-help-requ...@r-project.org<mailto:r-help-requ...@r-project.org> wrote: From: Duncan Murdoch mailto:murdoch.dun...@gmail.com>> Subject: Re: [R] SWEAVE - a gentle introduction Date: 18 November 2015 08:09:34 NZDT To: Marc Schwartz mailto:marc_schwa...@me.com>>, John Sorkin mailto:jsor...@grecc.umaryland.edu>> Cc: R-help mailto:r-help@r-project.org>> On 17/11/2015 10:42 AM, Marc Schwartz wrote: On Nov 17, 2015, at 9:21 AM, John Sorkin mailto:jsor...@grecc.umaryland.edu>> wrote: I am looking for a gentle introduction to SWEAVE, and would appreciate recommendations. I have an R program that I want to run and have the output and plots in one document. I believe this can be accomplished with SWEAVE. Unfortunately I don't know HTML, but am willing to learn. . . as I said I need a gentle introduction to SWEAVE. Thank you, John John, A couple of initial comments. First, you will likely get some recommendations to also consider using Knitr: http://yihui.name/knitr/ which I do not use myself (I use Sweave), but to be fair, is worth considering as an alternative. He did, and I'd agree with them. I've switched to knitr for all new projects and some old ones. knitr should be thought of as Sweave version 2. Duncan Murdoch [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R wont accept my zero count values in the GLM with quasi_poisson dsitribution
A further point to note is that with a log link, SEs for comparisons with any factor level where counts are all zero will be huge and meaningless. This phenomenon has the name Hauck-Donner effect, though more commonly so identified for comparisons with categories with very low or very high estimated proportions for binomial data. For the poisson or quasipoisson family, use of the sqrt link avoids this problem. John Maindonald email: john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au> On 28/07/2015, at 22:00, r-help-requ...@r-project.org<mailto:r-help-requ...@r-project.org> wrote: From: Andrew Robinson mailto:a.robin...@ms.unimelb.edu.au>> Subject: Re: [R] R wont accept my zero count values in the GLM with quasi_poisson dsitribution Date: 28 July 2015 18:59:51 NZST To: Charlotte mailto:charlotte.hu...@griffithuni.edu.au>> Cc: "R help (r-help@r-project.org<mailto:r-help@r-project.org>)" mailto:r-help@r-project.org>> You have selected the binomial family in the call to glm. You should instead try something like family=quasipoisson(link = "log") I hope this helps Andrew On Tue, Jul 28, 2015 at 4:33 PM, Charlotte < charlotte.hu...@griffithuni.edu.au<mailto:charlotte.hu...@griffithuni.edu.au>> wrote: Hello I have count values for abundance which follow a pattern of over-dispersal with many zero values. I have read a number of documents which suggest that I don't use data transforming methods but rather than I run the GLM with the quasi poisson distribution. So I have written my script and R is telling me that Y should be more than 0. Everything I read tells me to do it this way but I can't get R to agree. Did I need to add something else to my script to get it to work and keep my data untransformed? The script I wrote is as follows: fit <- glm(abundance~Gender,data=teminfest,family=binomial()) then I get this error Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1 I don't use R a lot so I am having trouble figuring out what to do next. I would appreciate some help Many Thanks Charlotte -- View this message in context: http://r.789695.n4.nabble.com/R-wont-accept-my-zero-count-values-in-the-GLM-with-quasi-poisson-dsitribution-tp4710462.html Sent from the R help mailing list archive at Nabble.com<http://nabble.com/>. __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Deputy Director, CEBRA, School of Biosciences Reader & Associate Professor in Applied Statistics Tel: (+61) 0403 138 955 School of Mathematics and StatisticsFax: +61-3-8344 4599 University of Melbourne, VIC 3010 Australia Email: a.robin...@ms.unimelb.edu.au<mailto:a.robin...@ms.unimelb.edu.au> Website: http://www.ms.unimelb.edu.au/~andrewpr MSME: http://www.crcpress.com/product/isbn/9781439858028 FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/ SPuR: http://www.ms.unimelb.edu.au/spuRs/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R: Re: Differences in output of lme() when introducing interactions
Do you have legal advice that suing the University (if that is the right context) would actually be a fruitful way forwards, that it would achieve anything useful within reasonable time and without causing the student severe financial risk? What may work in that context is for students to collectively complain that important aspects of their training and support are being neglected. With the rapidity of recent technological change, the issue is widespread. To an extent, able post-docs and PhDs have to lead the charge in getting training and support updated and brought into the modern world. John Maindonald email: john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au> On 22/07/2015, at 22:00, r-help-requ...@r-project.org<mailto:r-help-requ...@r-project.org> wrote: Da: li...@dewey.myzen.co.uk<mailto:li...@dewey.myzen.co.uk> Data: 21-lug-2015 11.58 A: "angelo.arc...@virgilio.it<mailto:angelo.arc...@virgilio.it>"mailto:angelo.arc...@virgilio.it>>, mailto:bgunter.4...@gmail.com>> Cc: mailto:r-help@r-project.org>> Ogg: Re: R: Re: [R] R: Re: Differences in output of lme() when introducing interactions Dear Angelo I suggest you do an online search for marginality which may help to explain the relationship between main effects and interactions. As I said in my original email this is a complicated subject which we are not going to retype for you. If you are doing this as a student I suggest you sue your university for failing to train you appropriately and if it is part of your employment I suggest you find a better employer. On 21/07/2015 10:04, angelo.arc...@virgilio.it<mailto:angelo.arc...@virgilio.it> wrote: Dear Bert, thank you for your feedback. Can you please provide some references online so I can improve "my ignorance"? Anyways, please notice that it is not true that I do not know statistics and regressions at all, and I am strongly convinced that my question can be of interest for some one else in the future. This is what forums serve for, isn't it? This is why people help each other, isn't it? Moreover, don't you think that I would not have asked to this R forum if I had the possibility to ask or pay a statician? Don't you think I have done already my best to study and learn before posting this message? Trust me, I have read different online tutorials on lme and lmer, and I am confident that I have got the basic concepts. Still I have not found the answer to solve my problem, so if you know the answer can you please give me some suggestions that can help me? I do not have a book where to learn and unfortunately I have to analyze the results soon. Any help? Any online reference to-the-point that can help me in solving this problem? Thank you in advance Best regards Angelo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in lm() with very small (close to zero) regressor
There are two issues: 1) Damage was done to accuracy in the calculation of t(X) %*% X Where the mean to SD ratio is large, maybe even of the order of 100 or so, centre that column. 2) pseudo inverse() goes awry with columns of X that are of the order of large negative (or, I expect, +ve) powers of ten. were supplied to it. I�d guess that this has to do with the way that a check for singularity is implemented. Ben�s example: set.seed(101) n_obs <- 1000 y <- rnorm(n_obs, 10,2.89) x1 <- rnorm(n_obs, mean=1.235657e-14,sd=4.5e-17) x2 <- rnorm(n_obs, 10,3.21) X <- cbind(rep(1,n_obs),x1,x2) coef(lm(y~x1+x2)) ## (Intercept)x1x2 ## 1.155959e+01 -1.658420e+14 3.797342e-02 library(corpcor) t(cbind(coef(lm(y~x1+x2)), as.vector(pseudoinverse(t(X) %*% X) %*% t(X) %*% y))) ## (Intercept)x1 x2 ## [1,] 11.559587 -1.658420e+14 0.03797342 ## [2,] 9.511348 1.151908e-13 0.03788714 ## Examine mean to SD ratios round(c(mean(x1)/sd(x1), mean(x2)/sd(x2)), 2) ## [1] 263.65 3.28 ## Notice that the coefficients for x2, where the mean/sd ratio ## is smaller, roughly agree. ## Damage was done to accuracy in the calculation of t(X) %*% X ## (but there is more to it than that, as we will see). ## Try xc1 <- scale(x1,center=TRUE, scale=FALSE) xc2 <- scale(x2,center=TRUE, scale=FALSE) Xc <- cbind(rep(1,n_obs),x1,x2) as.vector(pseudoinverse(t(Xc) %*% Xc) %*% t(Xc) %*% y) ## [1] 9.511348e+00 1.151908e-13 3.788714e-02 ## Note now, however, that one should be able to dispense with ## the column of 1s, with no change to the coefficients of x1 & x2 Xc0 <- cbind(xc1,xc2) as.vector(pseudoinverse(t(Xc0) %*% Xc0) %*% t(Xc0) %*% y) ## [1] 1.971167e-20 3.788714e-02 ## ## Now try a more sensible scaling for xc1 Xcs <- cbind(rep(1,n_obs), xc1*1e14, xc2) Xcs0 <- cbind(xc1*1e14, xc2) t(cbind(coef(lm(y~I(1e14*xc1)+xc2)), as.vector(pseudoinverse(t(Xcs) %*% Xcs) %*% t(Xcs) %*% y))) ## (Intercept) I(1e+14 * xc1)xc2 ## [1,]9.899249 -1.65842 0.03797342 ## [2,]9.899249 -1.65842 0.03797342 ## Eureka! ## cf also as.vector(pseudoinverse(t(Xcs0) %*% Xcs0) %*% t(Xcs0) %*% y) ## [1] -1.65842037 0.03797342 John Maindonald email: john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au> Centre for Mathematics & Its Applications, Australian National University, Canberra ACT 0200. On 29/03/2015, at 23:00, mailto:r-help-requ...@r-project.org>> mailto:r-help-requ...@r-project.org>> wrote: From: Ben Bolker mailto:bbol...@gmail.com>> Subject: Re: [R] Error in lm() with very small (close to zero) regressor Date: 29 March 2015 6:28:22 NZDT To: mailto:r-h...@stat.math.ethz.ch>> peter dalgaard gmail.com<http://gmail.com/>> writes: On 28 Mar 2015, at 00:32 , RiGui business.uzh.ch<http://business.uzh.ch/>> wrote: Hello everybody, I have encountered the following problem with lm(): When running lm() with a regressor close to zero - of the order e-10, the value of the estimate is of huge absolute value , of order millions. However, if I write the formula of the OLS estimator, in matrix notation: pseudoinverse(t(X)*X) * t(X) * y , the results are correct, meaning the estimate has value 0. How do you know this answer is "correct"? here is the code: y <- rnorm(n_obs, 10,2.89) x1 <- rnorm(n_obs, 0.01235657,0.45) x2 <- rnorm(n_obs, 10,3.21) X <- cbind(x1,x2) Eh? You want X <- cbind(rep(1,n_obs),x1,x2) bFE <- lm(y ~ x1 + x2) bFE bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y bOLS Note: I am applying a deviation from the mean projection to the data, that is why I have some regressors with such small values. Thank you for any help! Raluca Is there a reason you can't scale your regressors? Example not reproducible: I agree that the OP's question was not reproducible, but it's not too hard to make it reproducible. I bothered to use library("sos"); findFn("pseudoinverse") to find pseudoinverse() in corpcor: It is true that we get estimates with very large magnitudes, but their set.seed(101) n_obs <- 1000 y <- rnorm(n_obs, 10,2.89) x1 <- rnorm(n_obs, mean=1.235657e-14,sd=4.5e-17) x2 <- rnorm(n_obs, 10,3.21) X <- cbind(x1,x2) bFE <- lm(y ~ x1 + x2) bFE coef(summary(bFE)) Estimate Std. Error t value Pr(>|t|) (Intercept) 1.155959e+01 2.312956e+01 0.49977541 0.6173435 x1 -1.658420e+14 1.872598e+15 -0.08856254 0.9294474 x2 3.797342e-02 2.813000e-02 1.34992593 0.1773461 library("corpcor") bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y bOLS [,1] [1,] 9.807664e-16 [2,] 8.880273e-01 And if we scale the predictor: bFE2 <- lm(y ~ I(1e14*x1) + x2) coef(summary(bFE2)) Estimate Std. Error t value Pr(&g
Re: [R] the making of _R_ eBooks
Thanks for that. Useful to have that question asked and to get that information. John Maindonald email: john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au> On 25/03/2015, at 0:00, r-help-requ...@r-project.org<mailto:r-help-requ...@r-project.org> wrote: From: John McKown mailto:john.archie.mck...@gmail.com>> Subject: Re: [R] the making of _R_ eBooks Date: 24 March 2015 1:36:38 NZDT To: "Dr. Wolfgang Lindner" mailto:lindn...@t-online.de>> Cc: Help R mailto:r-help@r-project.org>> On Mon, Mar 23, 2015 at 3:50 AM, Dr. Wolfgang Lindner mailto:lindn...@t-online.de>> wrote: Dear list members, I like the look and feel of the eBook versions of the R manuals very much. So I would like to generate eBooks (teaching material etc) in that look. I am not an expert. But I have looked at the source, so I can give you some information. Q1: is there a description how the _R_ ebooks have been produced? Looking at the source, it appears that the source manuals are in a document markup language called "GNU Texinfo". https://www.gnu.org/software/texinfo/ You can think of this as something akin to, but different from, HTML or "markdown" encoding. Texinfo is an evolution by the system first designed by Richard Stallman of MIT. He is the driving force behind the GPL and most of the GNU software which forms the basis of the user space commands for Linux and the *BSD operating systems. Texinfo is then converted to TeX. TeX is the typesetting language designed by Dr. Donald Knuth. TeX, nominally, is converted into a DVI printer control language (DeVice Independent). But in the case of creating a PDF file, there is a processor called "pdftex", http://en.wikipedia.org/wiki/PdfTeX, which produces a PDF file as output . A good site for TeX is https://tug.org/ Texinfo has the plus of also having processor which will convert it to UNIX "man" (manual) pages and HTML web pages. So one "source" document can generate three different types of output document file types. Most people use a enhanced TeX called LaTeX instead of "plain TeX" when using TeX. LaTeX can be read up on here: http://www.latex-project.org/ A good TeX document processor is TeXstudio at http://texstudio.sourceforge.net/ . I use this one myself (which is not necessary a strong endorsement because I'm nobody special). I feel the need to warn you that TeX is very powerful and, at least to me, quite difficult, with a fairly step learning curve. Which may be why the R project uses Texinfo because it is quite a bit easier to learn. Q2: which (free) software was used for them? See the links above. On Fedora Linux, I get the TeX oriented software from a bunch of packages which start with "texlive". More information, including the processors for Linux, Windows, and Mac are at https://www.tug.org/texlive/ Q3: any other recommendations? You might consider LyX. http://www.lyx.org/ LyX is a document processor. It would likely be easier to use than the above if you are used to MS Word or other word processing system. It is cross platform: Linux, Windows, and Mac. It stores files in its own textual format, which is somewhat human readable. LyX, like Texinfo, translates its format into TeX as an intermediate on its way to its ultimate destination. I am still learning LyX, but I personally like it. Your mention of LibreOffice is also a fairly good one. I, personally, use LibreOffice. But I don't use it for big documents. I have a learned aversion for word processors because it is so easy for them to be misused. In my opinion, a good document needs good metadata in it as well as just "looking pretty". Word processor users tend to focus on the format and not the content. That's just my opinion, based on what I've seen where I work. Seaching the internet gives me e.g. [1] https://sites.google.com/site/richardbyrnepdsite/ebooks-and-audiobooks/create-your-own-ebooks [2] opensource.com/life/13/8/how-create-ebook-open-source-way<http://opensource.com/life/13/8/how-create-ebook-open-source-way> [3] http://scottnesbitt.net/ubuntublog/creating-a-ebook-with-libreoffice-writer/ but I m not sure, if there are better possibilities.. Thanks for any hint or link by expert R users. Oh, well, that excludes me. I'm not an expert. But maybe it was helpful anyway. Wolfgang Lindner Leichlingen, Germany -- If you sent twitter messages while exploring, are you on a textpedition? He's about as useful as a wax frying pan. 10 to the 12th power microphones = 1 Megaphone Maranatha! <>< John McKown [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rpart and randomforest results
The function rpart may well overfit if the value of the CP statistic is left at its default. Use the functions printcp() and plotcP() to check how the cross-validation estimate of relative error (xerror) changes with the number of splits (NB that the CP that leads to a further split changes monotonically with the number of splits). The rel error column from printcp() can be hopelessly optimistic. John Maindonald email: john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au> phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 8/04/2014, at 8:00 pm, r-help-requ...@r-project.org<mailto:r-help-requ...@r-project.org> wrote: From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> [mailto:r-help-boun...@r-project.org] On Behalf Of Schillo, Sonja Sent: Thursday, April 03, 2014 3:58 PM To: Mitchell Maltenfort Cc: r-help@r-project.org<mailto:r-help@r-project.org> Subject: Re: [R] rpart and randomforest results Hi, the random forest should do that, you're totally right. As far as I know it does so by randomly selecting the variables considered for a split (but here we set the option for how many variables to consider at each split to the number of variables available so that I thought that the random forest does not have the chance to randomly select the variables). The next thing that randomforest does is bootstrapping. But here again we set the option to the number of cases we have in the data set so that no bootstrapping should be done. We tried to take all the "randomness" from the randomforest away. Is that plausible and does anyone have another idea? Thanks Sonja [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Drawing a line in xyplot
PS: The following shows possibilities that are available using latticeExtra layering: ## Best make type and attend factors xdat = data.frame(mortality =c(5, 8,7,5,8,10,11,6,4,5,20,25,27,30,35,32,28,21,20,34,11,15,18,12,15,12,10,15,19,20), type= factor(c(1, 1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)), attend = factor(c(1, 0,1,1,1,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,1,1,0,1,0,1,0,0,0))) gph <- xyplot (mortality ~ attend|type, pch=16, data = xdat, aspect=2:1,layout=c(3,1)) one4all <- layer(avy <- median(xdat$mortality), panel.segments(0.1, avy, 0.3, avy, col="red",lwd=4), panel.segments(0.7, avy, 1, avy, col="red",lwd=4)) medbytype <- layer(avy <- median(y), panel.segments(0.1, avy, 0.3, avy, col="red",lwd=4), panel.segments(0.7, avy, 1, avy, col="red",lwd=4)) interact <- layer(panel.average(x, y, fun=median, col='red', lwd=4)) Compare gph + one4all (shows overall median lines in all 3 panels) gph + medbytype (shows separate median lines for the separate panels) gph+interact (gives a form of interaction plot) NB x (if its values are used) and y are local to the individual panel NB also that layer() accepts a data argument. The following is an alternative way to calculate one4all: one4all <- layer(data=xdat, avy <- median(mortality), panel.segments(0.1, avy, 0.3, avy, col="red",lwd=4), panel.segments(0.7, avy, 1, avy, col="red", lwd=4)) John Maindonald. > This can be simplified by using the layering abilities that Felix Andrews > made available in latticeExtra. These are too little known. These pretty > much make it unnecessary to resort to trellis.focus(), at least in such > cases as this. These layering abilities are too little known: > > library(latticeExtra) > x11(height=8,width=11) > xdat = data.frame(mortality =c(5, > 8,7,5,8,10,11,6,4,5,20,25,27,30,35,32,28,21,20,34,11,15,18,12,15,12,10,15,19,20), > type= > c(1, 1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3), attend = > c(1, 0,1,1,1,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,1,1,0,1,0,1,0,0,0)) > gph <- xyplot ( mortality ~ attend|type, > panel=function(x,y) > { > panel.grid(lty=5) > panel.xyplot(x,y,pch=16,jitter.x=TRUE,col=1) > for(i in 1:3) > { > panel.segments(x0=c(0.7, 1.7), > x1=c(1.3, 2.3), > y0=with(xdat, c(tapply(mortality[type==i], attend[type==i], median))), > y1=with(xdat, c(tapply(mortality[type==i], attend[type==i], > median))),col="red",lwd=4) > } > }, > data = xdat, aspect=2:1,layout=c(3,1)) > > ## Now create a layer object that will add the further segments. > addlayer <- layer(panel.segments(x0=1,y0=20,x1=1.2, y1=20), > panel.segments(x0=2,y0=30,x1=2.2, y1=30)) > > gph+addlayer > > The code that produces the object gph would also be simpler > and easier to follow if some relevant part was separated out into > a separate layer. > > See also my notices on layering of lattice objects at: > http://www.maths.anu.edu.au/%7Ejohnm/r-book/add-graphics.html > > John Maindonald email: john.maindon...@anu.edu.au > phone : +61 2 (6125)3473fax : +61 2(6125)5549 > Centre for Mathematics & Its Applications, Room 1194, > John Dedman Mathematical Sciences Building (Building 27) > Australian National University, Canberra ACT 0200. > http://www.maths.anu.edu.au/~johnm > > On 08/04/2012, at 8:00 PM, r-help-requ...@r-project.org wrote: > >> From: David Winsemius >> Subject: Re: [R] Drawing a line in xyplot >> Date: 8 April 2012 2:17:22 PM AEST >> To: wcheckle >> Cc: r-help@r-project.org >> >> >> >> On Apr 7, 2012, at 10:29 PM, wcheckle wrote: >> >>> Thank you David, the bwplot option does what I need: >>> >>> x11(height=8,width=11) >>> bwplot ( mortality~ attend|type, >>> pch=95,cex=5,col=2, >>> par.settings=list( >>> box.rectangle = list(col = "transparent"), >>> box.umbrella = list(col = "transparent"), >>> plot.symbol = list(col = "transparent") >>> ), >>> panel=function(x,y,...){ >>> panel.grid(lty=5) >>> panel.xyplot(x,y,pch=16,jitter.x=TRUE,col=1) >>> panel.bwplot(x,y,...) >>> }, >>> data = x,aspect=2:1,layout=c(3,1)) >>> >>> >>> However, I am interested in also learning how to do it in xyplot as well. I >>> wasnt able to follow the last two set of instructions (condition on >>> packet.number and loop over segments), wondering if I can ask for your help >>> for the correct code (my attempt resulted in all three mean lines within >>> each p
Re: [R] Drawing a line in xyplot
This can be simplified by using the layering abilities that Felix Andrews made available in latticeExtra. These are too little known. These pretty much make it unnecessary to resort to trellis.focus(), at least in such cases as this. These layering abilities are too little known: library(latticeExtra) x11(height=8,width=11) xdat = data.frame(mortality =c(5, 8,7,5,8,10,11,6,4,5,20,25,27,30,35,32,28,21,20,34,11,15,18,12,15,12,10,15,19,20), type= c(1, 1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3), attend = c(1, 0,1,1,1,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,1,1,0,1,0,1,0,0,0)) gph <- xyplot ( mortality ~ attend|type, panel=function(x,y) { panel.grid(lty=5) panel.xyplot(x,y,pch=16,jitter.x=TRUE,col=1) for(i in 1:3) { panel.segments(x0=c(0.7, 1.7), x1=c(1.3, 2.3), y0=with(xdat, c(tapply(mortality[type==i], attend[type==i], median))), y1=with(xdat, c(tapply(mortality[type==i], attend[type==i], median))),col="red",lwd=4) } }, data = xdat, aspect=2:1,layout=c(3,1)) ## Now create a layer object that will add the further segments. addlayer <- layer(panel.segments(x0=1,y0=20,x1=1.2, y1=20), panel.segments(x0=2,y0=30,x1=2.2, y1=30)) gph+addlayer The code that produces the object gph would also be simpler and easier to follow if some relevant part was separated out into a separate layer. See also my notices on layering of lattice objects at: http://www.maths.anu.edu.au/%7Ejohnm/r-book/add-graphics.html John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 08/04/2012, at 8:00 PM, r-help-requ...@r-project.org wrote: > From: David Winsemius > Subject: Re: [R] Drawing a line in xyplot > Date: 8 April 2012 2:17:22 PM AEST > To: wcheckle > Cc: r-help@r-project.org > > > > On Apr 7, 2012, at 10:29 PM, wcheckle wrote: > >> Thank you David, the bwplot option does what I need: >> >> x11(height=8,width=11) >> bwplot ( mortality~ attend|type, >> pch=95,cex=5,col=2, >> par.settings=list( >> box.rectangle = list(col = "transparent"), >> box.umbrella = list(col = "transparent"), >> plot.symbol = list(col = "transparent") >> ), >> panel=function(x,y,...){ >> panel.grid(lty=5) >> panel.xyplot(x,y,pch=16,jitter.x=TRUE,col=1) >> panel.bwplot(x,y,...) >> }, >> data = x,aspect=2:1,layout=c(3,1)) >> >> >> However, I am interested in also learning how to do it in xyplot as well. I >> wasnt able to follow the last two set of instructions (condition on >> packet.number and loop over segments), wondering if I can ask for your help >> for the correct code (my attempt resulted in all three mean lines within >> each panel): >> >> x11(height=8,width=11) >> xyplot ( mortality ~ attend|type, >> panel=function(x,y) >> { >> panel.grid(lty=5) >> panel.xyplot(x,y,pch=16,jitter.x=TRUE,col=1) >> for(i in 1:3) >> { >> panel.segments(x0=c(0.7, 1.7), >> x1=c(1.3, 2.3), >> y0=c(tapply(x$mortality[x$type==i], x$attend[x$type==i], median)), >> y1=c(tapply(x$mortality[x$type==i], x$attend[x$type==i], >> median)),col="red",lwd=4) >> } >> }, >> data = x,aspect=2:1,layout=c(3,1)) >> >> thank you again. I also found your info on data.frame useful. > > I haven't figured out how to do it with the interaction of 'attend' and > ''type' from inside xyplot. I'm thinking I might be able to succeed using > trellis.focus() to address separate "columns" within a particular panel. > > This will draw segments at (1,20) and (2,30) without resorting to low level > grid/viewport stuff. > > trellis.unfocus(); trellis.focus("panel", 1, 1) > do.call("panel.segments", list(x0=1,y0=20,x1=1.2, y1=20)) > trellis.unfocus() > trellis.unfocus(); trellis.focus("panel", 1, 1) > do.call("panel.segments", list(x0=2,y0=30,x1=2.2, y1=30)) > trellis.unfocus() > > -- > David > > > David Winsemius, MD > West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RC33 8th Int Conf on Social Science Methodology -- The R System ...
I wish to draw attention to an R-related session that is planned for the RC33 Eighth International Conference on Social Science Methodology, to be held over July 9 - July 13 2012, at the University of Sydney. " The focus of the conference is on innovations and current best practice in all aspects of social science research methodology. It provides an opportunity to reflect on contemporary methods, as applied in a range of settings and disciplinary contexts, to hear about emerging methods, tools, techniques and technologies, and to discover what resources are available to social science researchers and users of research. " The title for the planned session is: "The R System as a Platform for Analysis and Development of Analysis Methodology" http://conference.acspri.org.au/index.php/rc33/2012/schedConf/trackPolicies John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cv.lm syntax error
It means that dat$v1, dat$v2, . . . are not columns in the data frame df Specify the formula as v1 ~ v2+v2+v3+v4+v5+factor Then (assuming that factor is a column object of the right length) you should be fine. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm > From: agent dunham > Date: 9 March 2011 3:47:10 AM AEDT > To: r-help@r-project.org > Subject: Re: [R] cv.lm syntax error > > > Thanks for your answer, but I type the following: > >> dfmod.1 <- data.frame(dat$v1,dat$v2,dat$v3,dat$v4, dat$v5,factor) > >> CVlm(df= dfmod.1, form.lm = formula(dat$v1 ~ dat$v2+dat$v3+ dat$v4+ >> dat$v5+ factor), m=3, seed=29, plotit=TRUE, printit =TRUE) > > Error en `[.data.frame`(df, , ynam) : undefined columns selected > > What does it mean? > > u...@host.com > > -- > View this message in context: > http://r.789695.n4.nabble.com/cv-lm-syntax-error-tp3334889p3341767.html > Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 97, Issue 11
You might look at the function confusion() in the DAAGxtras package. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 09/03/2011, at 10:00 PM, r-help-requ...@r-project.org wrote: > From: "Shi, Tao" > Date: 9 March 2011 7:22:49 AM AEDT > To: r-help@r-project.org > Subject: [R] confusion matrix > > > Hi list, > > Is there already a function somewhere to output the confusion matrix from two > input vectors? "table" always automatically delete rows or columns with all > 0's. For example, I would like the columns for "10" and "30" added back. > > Thanks! > > ...Tao > > > > 20 40 50 60 70 80 90 100 > 10 0 0 1 0 0 0 1 0 > 20 1 2 0 4 0 0 0 1 > 30 0 0 0 0 0 0 1 1 > 40 0 0 1 1 2 0 0 0 > 50 0 1 0 0 0 1 0 0 > 60 0 0 0 0 0 0 1 1 > 70 0 0 1 1 1 1 1 1 > 80 0 1 0 0 0 0 0 1 > 90 0 0 0 0 0 0 0 3 > 100 0 0 0 0 1 0 0 3 > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] \Sweaveopts error
Actually, I had until I created this file had everything on one line, or had separate \SweaveOpts commands. Putting everything on one line ensures that the graphics file goes to the subdirectory snArt, but the file test1.tex is unchanged. Note however the difference between the files test1.tex and test.tex, irrespective of the 1 line or 2 issue. The first of the \SweaveOpts lines had an effect, unless I am missing something. John. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 28/11/2010, at 12:40 PM, Duncan Murdoch wrote: > On 27/11/2010 7:57 PM, John Maindonald wrote: >> Actually, I spoke too soon. The files process without obvious error, >> but keep.source=TRUE is ignored. I have posted small files >> test1.Rnw and test2.Rnw that can be used to demonstrate the >> problems at: >> http://wwwmaths.anu.edu.au/~johnm/r/issues/ > > In the test1.Rnw file, your \SweaveOpts statement spans two lines. Sweave() > doesn't recognize it, so it's completely ignored. > > I don't think this is new behaviour; it's a consequence of the way Sweave > looks for \SweaveOpts via regular expression. > > Not quite sure what's going on in the second one yet... > > Duncan Murdoch > >> >> Sweave("test1") ## includes SweaveOpts settings >> >> Sweave("test2", keep.source=TRUE) >> ## SweaveOpts settings have been removed. >> >> Comments do not appear in the LaTeX file that results, the code >> is reformatted and the graph from test1 goes into the working >> directory. >> >> Notice also the NA that mysteriously appears in the second >> line of code in output .tex file test2.tex >> >> John. >> >> John Maindonald email: john.maindon...@anu.edu.au >> phone : +61 2 (6125)3473fax : +61 2(6125)5549 >> Centre for Mathematics& Its Applications, Room 1194, >> John Dedman Mathematical Sciences Building (Building 27) >> Australian National University, Canberra ACT 0200. >> http://www.maths.anu.edu.au/~johnm >> >> On 26/11/2010, at 4:26 PM, John Maindonald wrote: >> >>> Yes, that has fixed the problem. (2010-11-24 r53659) >>> >>> Thanks. >>> >>> John Maindonald email: john.maindon...@anu.edu.au >>> phone : +61 2 (6125)3473fax : +61 2(6125)5549 >>> Centre for Mathematics& Its Applications, Room 1194, >>> John Dedman Mathematical Sciences Building (Building 27) >>> Australian National University, Canberra ACT 0200. >>> http://www.maths.anu.edu.au/~johnm >>> >>> On 25/11/2010, at 10:56 PM, Duncan Murdoch wrote: >>> >>>> On 25/11/2010 6:34 AM, John Maindonald wrote: >>>>> I have a file 4lmetc.Rnw, intended for inclusion in a LaTeX document, >>>>> that starts: >>>> >>>> I think this may have been fixed in the patched version. Could you give >>>> it a try to confirm? If not, please send me a simplified version of the >>>> file, and I'll see what's going wrong. >>>> >>>> Duncan Murdoch >>>> >>>> >>>>> >>>>> \SweaveOpts{engine=R, keep.source=TRUE} >>>>> \SweaveOpts{eps=FALSE, prefix.string=snArt/4lmetc} >>>>> >>>>> The attempt to process the file through Sweave generates the error: >>>>> >>>>>> Sweave("4lmetc") >>>>> Writing to file 4lmetc.tex >>>>> Processing code chunks ... >>>>> 1 : keep.source term verbatim >>>>> Error in file(srcfile$filename, open = "rt", encoding = encoding) : >>>>> cannot open the connection >>>>> In addition: Warning message: >>>>> In file(srcfile$filename, open = "rt", encoding = encoding) : >>>>> cannot open file '4lmetc': No such file or directory >>>>> >>>>> The same file processes through Stangle() without problems. >>>>> If I comment out the \Sweaveopts lines, there is no problem, >>>>> except that I do not get the options that I want. >>>>> >>>>> This processed fine in R-2.11.1 >>>>> >>>>>> sessionInfo() >>>>> R version 2.12.0 (201
Re: [R] \Sweaveopts error
Actually, I spoke too soon. The files process without obvious error, but keep.source=TRUE is ignored. I have posted small files test1.Rnw and test2.Rnw that can be used to demonstrate the problems at: http://wwwmaths.anu.edu.au/~johnm/r/issues/ Sweave("test1") ## includes SweaveOpts settings Sweave("test2", keep.source=TRUE) ## SweaveOpts settings have been removed. Comments do not appear in the LaTeX file that results, the code is reformatted and the graph from test1 goes into the working directory. Notice also the NA that mysteriously appears in the second line of code in output .tex file test2.tex John. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 26/11/2010, at 4:26 PM, John Maindonald wrote: > Yes, that has fixed the problem. (2010-11-24 r53659) > > Thanks. > > John Maindonald email: john.maindon...@anu.edu.au > phone : +61 2 (6125)3473fax : +61 2(6125)5549 > Centre for Mathematics & Its Applications, Room 1194, > John Dedman Mathematical Sciences Building (Building 27) > Australian National University, Canberra ACT 0200. > http://www.maths.anu.edu.au/~johnm > > On 25/11/2010, at 10:56 PM, Duncan Murdoch wrote: > >> On 25/11/2010 6:34 AM, John Maindonald wrote: >>> I have a file 4lmetc.Rnw, intended for inclusion in a LaTeX document, >>> that starts: >> >> I think this may have been fixed in the patched version. Could you give it >> a try to confirm? If not, please send me a simplified version of the file, >> and I'll see what's going wrong. >> >> Duncan Murdoch >> >> >>> >>> \SweaveOpts{engine=R, keep.source=TRUE} >>> \SweaveOpts{eps=FALSE, prefix.string=snArt/4lmetc} >>> >>> The attempt to process the file through Sweave generates the error: >>> >>>> Sweave("4lmetc") >>> Writing to file 4lmetc.tex >>> Processing code chunks ... >>> 1 : keep.source term verbatim >>> Error in file(srcfile$filename, open = "rt", encoding = encoding) : >>> cannot open the connection >>> In addition: Warning message: >>> In file(srcfile$filename, open = "rt", encoding = encoding) : >>> cannot open file '4lmetc': No such file or directory >>> >>> The same file processes through Stangle() without problems. >>> If I comment out the \Sweaveopts lines, there is no problem, >>> except that I do not get the options that I want. >>> >>> This processed fine in R-2.11.1 >>> >>>> sessionInfo() >>> R version 2.12.0 (2010-10-15) >>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets grid methods >>> [8] base >>> >>> other attached packages: >>> [1] lattice_0.19-13 DAAG_1.02 randomForest_4.5-36 >>> [4] rpart_3.1-46MASS_7.3-8 reshape_0.8.3 >>> [7] plyr_1.2.1 proto_0.3-8 >>> >>> loaded via a namespace (and not attached): >>> [1] ggplot2_0.8.8 latticeExtra_0.6-14 >>> >>> Is there a workaround? >>> >>> John Maindonald email: john.maindon...@anu.edu.au >>> phone : +61 2 (6125)3473fax : +61 2(6125)5549 >>> Centre for Mathematics& Its Applications, Room 1194, >>> John Dedman Mathematical Sciences Building (Building 27) >>> Australian National University, Canberra ACT 0200. >>> http://www.maths.anu.edu.au/~johnm >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] \Sweaveopts error
Yes, that has fixed the problem. (2010-11-24 r53659) Thanks. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 25/11/2010, at 10:56 PM, Duncan Murdoch wrote: > On 25/11/2010 6:34 AM, John Maindonald wrote: >> I have a file 4lmetc.Rnw, intended for inclusion in a LaTeX document, >> that starts: > > I think this may have been fixed in the patched version. Could you give it a > try to confirm? If not, please send me a simplified version of the file, and > I'll see what's going wrong. > > Duncan Murdoch > > >> >> \SweaveOpts{engine=R, keep.source=TRUE} >> \SweaveOpts{eps=FALSE, prefix.string=snArt/4lmetc} >> >> The attempt to process the file through Sweave generates the error: >> >>> Sweave("4lmetc") >> Writing to file 4lmetc.tex >> Processing code chunks ... >> 1 : keep.source term verbatim >> Error in file(srcfile$filename, open = "rt", encoding = encoding) : >> cannot open the connection >> In addition: Warning message: >> In file(srcfile$filename, open = "rt", encoding = encoding) : >> cannot open file '4lmetc': No such file or directory >> >> The same file processes through Stangle() without problems. >> If I comment out the \Sweaveopts lines, there is no problem, >> except that I do not get the options that I want. >> >> This processed fine in R-2.11.1 >> >>> sessionInfo() >> R version 2.12.0 (2010-10-15) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets grid methods >> [8] base >> >> other attached packages: >> [1] lattice_0.19-13 DAAG_1.02 randomForest_4.5-36 >> [4] rpart_3.1-46MASS_7.3-8 reshape_0.8.3 >> [7] plyr_1.2.1 proto_0.3-8 >> >> loaded via a namespace (and not attached): >> [1] ggplot2_0.8.8 latticeExtra_0.6-14 >> >> Is there a workaround? >> >> John Maindonald email: john.maindon...@anu.edu.au >> phone : +61 2 (6125)3473fax : +61 2(6125)5549 >> Centre for Mathematics& Its Applications, Room 1194, >> John Dedman Mathematical Sciences Building (Building 27) >> Australian National University, Canberra ACT 0200. >> http://www.maths.anu.edu.au/~johnm >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] \Sweaveopts error
I have a file 4lmetc.Rnw, intended for inclusion in a LaTeX document, that starts: \SweaveOpts{engine=R, keep.source=TRUE} \SweaveOpts{eps=FALSE, prefix.string=snArt/4lmetc} The attempt to process the file through Sweave generates the error: > Sweave("4lmetc") Writing to file 4lmetc.tex Processing code chunks ... 1 : keep.source term verbatim Error in file(srcfile$filename, open = "rt", encoding = encoding) : cannot open the connection In addition: Warning message: In file(srcfile$filename, open = "rt", encoding = encoding) : cannot open file '4lmetc': No such file or directory The same file processes through Stangle() without problems. If I comment out the \Sweaveopts lines, there is no problem, except that I do not get the options that I want. This processed fine in R-2.11.1 > sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets grid methods [8] base other attached packages: [1] lattice_0.19-13 DAAG_1.02 randomForest_4.5-36 [4] rpart_3.1-46MASS_7.3-8 reshape_0.8.3 [7] plyr_1.2.1 proto_0.3-8 loaded via a namespace (and not attached): [1] ggplot2_0.8.8 latticeExtra_0.6-14 Is there a workaround? John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] L-shaped boxes with lattice graphs?
Can anyone suggest an equivalent, for lattice graphs, of the base graphics argument bty="l"? NB that I am leaving off the box around the strip, with a strip function: stripfun <- function(which.given,which.panel, factor.levels=as.expression(levlist), ...){ panel.text(x=0, y=0.5, lab = as.expression(levlist[which.panel[which.given]]), adj=0) } e.g. levlist <- list("A", "B") xyplot(11:14 ~ 1:4 | rep(1:2,2), scales=list(x=list(alternating=c(1,1), relation="sliced")), strip=stripfun, layout=c(1,2)) John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Why software fails in scientific research
I came across this notice of an upcoming webinar. The issues identified in the first paragraph below seem to me exactly those that the R project is designed to address. The claim that "most research software is barely fit for purpose compared to equivalent systems in the commercial world" seems to me not quite accurate! Comments! WEBINAR SERIES A Crack in the Code: Why software fails in scientific research, and how to fix it. Thursday, March 25, 2010, 3:00 PM GMT http://physicsworld.com/cws/go/webinar9 " In the 60 years since the invention of the digital computer, millions of lines of code have been developed to support scientific research. Although an increasingly important part of almost all research projects, most research software is barely fit for purpose compared to equivalent systems in the commercial world. The code is hard to understand or maintain, lacking documentation and version control, and is continually ‘re-invented’ as the code writers move on to new jobs. This represents a tremendous waste of the already inadequate resources that are put into its development. We will investigate how this situation has come about, why it is important to the future of research, and what can be done about it. Robert McGreevy will draw on his extensive experience at the STFC ISIS Facility, and explain how these issues are being addressed for the benefit of research science globally. Nicholas Draper, consultant at Tessella, will then expand on this, using the example of the Mantid project at ISIS. " Tessella (www.tessella.com) is a technology and consultancy firm, based in Oxford. ISIS (International Species Information System) (www.isis.org) has as its mission the facilitation of "international collaboration in the collection and sharing of knowledge on animals and their environments for zoos, aquariums and related organizationsvalues the use of objective data to benefit conservation, science, animal welfare, education, and collection management." John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parsing pdf files
Oblivious to the problems that Barry notes, I have used pdftotext, from Xpdf at http://www.foolabs.com/xpdf/download.html without apparent problem; this under MacOS X. For my purposes, I need to retain the CTRL/Fs that indicate page throws. Other converters that I have investigate seem not to retain such information. I use it with command line options thus: pdftotext -layout -eol unix rnotes.pdf rnotes.txt Given a pdf for a manuscript, I have an R function that can then create an index of functions that appear with their opening and closing parentheses. (It may pick up a few strays, which can be weeded out.) The only issue I've found has been for use with the listings package. In the LaTeX source, I set the option 'columns' to 'fullflexible', in order to generate a pdf file that does not have the unwanted hidden spaces, which are then carried across to the text file. {\lstset{language=R, xleftmargin=2pt, basicstyle=\ttfamily, columns=fullflexible, % Omit for final manuscript showstringspaces=false}} {} The setting 'columns=fullflexible' messes up the formatting in places where I use tabbing, so that I need to change 'columns' back to the default for the final pdf. Here is the function that does most of the work. Like the function that follows, it could no doubt be greatly tidied up: locatefun <- function(txlines){ idtxt <- "\\.?[a-zA-Z][a-zA-Z0-9]*(\\.[a-zA-Z]+)*\\(" z <- regexpr("\014", txlines) z[z>0] <- 1 z[z<=0] <- 0 page <- cumsum(z)+1 k <- 0 findfun <- function(tx){ mn <- t(sapply(tx, function(x) {m <- regexpr(idtxt, x); c(m, attr(m, "match.length"))})) mn[,2] <- mn[,1]+mn[,2] rownames(mn) <- paste(1:dim(mn)[1]) mn } for(i in 1:100){ mn <- findfun(txlines) if(all(mn[,1]==-1))break here <- mn[,1]>0 page <- page[here] txlines <- txlines[here] mn <- mn[here, , drop=FALSE] m1 <- regexpr("\\(", txlines)-1 tx1 <- substring(txlines,mn[,1],m1) if(i==1)xy <- data.frame(nam=I(tx1), page=page) else xy <- rbind(xy, data.frame(nam=I(tx1), page=page)) txlines <- substring(txlines,mn[,2]) here2 <- nchar(txlines)>0 txlines <- txlines[here2] page <- page[here2] if(length(txlines)==0)break } zz <- !xy[,1]%in% c("","al","Pr","T", "F", "n","P", "y", "A", "transformation", "left","f","site.","a","b", "II", "ARCH", "ARMA", "MA") xy <- xy[zz,] nam <- xy$nam ch <- substring(nam,1,1) nam[ch%in%c("="," ",",")] <- substring(nam[ch%in%c("="," ",",")],2) xy$nam <- nam ord <- order(xy[,2]) xy[ord,] } Here is the function that calls findfun: makeFunIndex <- function(sourceFile="rnotes.txt", frompath="~/_notes/rnotes/", fileout=NULL, availfun=funpack, offset=0){ ## pdftotext -layout -eol unix rnotes.pdf rnotes.txt len <- nchar(sourceFile) lfrom <- nchar(frompath) if(substring(frompath, lfrom, lfrom)=="/")frompath <- substring(frompath, 1, lfrom-1) if(is.null(fileout)){ if (substring(sourceFile,len - 3, len-3) == ".") fnam <- substring(sourceFile, 1, len - 4) else fnam <- sourceFile fileout <- paste(fnam, ".fdx", sep = "") fdxfile <- paste(fileout, sep="/") fndfile <- paste(fnam, ".fnd", sep = "") } sourceFile <- paste(frompath, sourceFile, sep="/") print(paste("Send output to", fndfile)) tx <- readLines(sourceFile, warn=FALSE) entrymat <- locatefun(tx) backn <- regexpr("\\n",entrymat[,1],fixed=TRUE) entrymat <- entrymat[backn < 0,] entrymat[,2] <- entrymat[,2] - offset entrymat[,1] <- gsub("_","\\_",entrymat[,1], fixed=TRUE) nmatch <- match(entrymat[,1], availfun[,2], nomatch=0) use <- nmatch > 0 print("Unmatched functions:") print(unique(entrymat[!use,1])) entrymat[use,1] <- paste(entrymat[use
Re: [R] R on Linux, and R on Windows , any difference in maturity+stability?
I had a large job time ago that ran fine under MacOS X. I'd expect the same to be true under Linux. It would run under Windows XP only if XP had been freshly rebooted. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 06/10/2009, at 9:00 PM, r-help-requ...@r-project.org wrote: > From: Liviu Andronic > Date: 6 October 2009 6:46:33 PM AEDT > To: Robert Wilkins > Cc: r-help@r-project.org > Subject: Re: [R] R on Linux, and R on Windows , any difference in > maturity+stability? > > > On 10/6/09, Robert Wilkins wrote: >> Will R have more glitches on one operating system as opposed to >> another, >> > Probably not. > >> or is it pretty much the same? >> > Depending on the complexity of the code, it is pretty much the same. I > recently had a (relatively simple) group project, with the three of us > on different OSs: Win, Mac and Linux. We did not encounter one > platform specific issue. > Liviu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Use of R in Schools
I am looking for information on experimentation with the use of R in the teaching of statistics and science in schools. Any leads would be very welcome. I am certain that there is such experimentation. I've made this inquiry on r-sig-teaching, with no response. John. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R User Group in Canberra
Please add AUSTRALIA Canberra: Canberra R Users Group (CANRUG) http://canrug.togaware.com/ Cheers john. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 31/07/2009, at 8:00 PM, r-help-requ...@r-project.org wrote: > listings [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Strip labels: use xyplot() to plot columns in parallel with outer=TRUE
The following tinkers with the strip labels, where the different panels are for different levelf of a conditioning factor. tau <- (0:5)/2.5; m <- length(tau); n <- 200; SD <- 2 x0 <- rnorm(n, mean=12.5, sd=SD) matdf <- data.frame( x = as.vector(sapply((0:5)/2.5, function(s)x0+rnorm(n, sd=2*s))), y <- rep(15+2.5*x0, m), taugp = factor(rep(tau, rep(n,m names(matdf) <- c("x","y","taugp") lab <- c(list("0 (No error in x)"), lapply(tau[-1], function(x)substitute(A*s[z], list(A=x xyplot(y ~ x | taugp, data=matdf, strip=strip.custom(strip.names=TRUE, var.name="Add error with SD", sep=expression(" = "), factor.levels=as.expression(lab))) Is there any way to get custom labels when the same is done by plotting, variables in parallel?: df <- unstack(matdf, x ~ taugp) df$y <- 15+2.5*x0 lab2 <- c(list("0 (No error in x)"), lapply(tau[-1], function(x)substitute("Add error with SD" == A*s[z], list(A=x form <- formula(paste("y ~ ", paste(paste("X", tau, sep=""), collapse="+"))) xyplot(form, data=df, outer=TRUE) I'd hoped that the following would do the trick, but the first label is repeated in each panel, and the variable names are added: xyplot(form, data=df, outer=TRUE, strip=strip.custom(strip.names=TRUE, var.name=as.expression(lab2))) John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accuracy of R and other platforms
These comparisons are very simplistic. In most contexts, it would make much better sense to measure "accuracy" in standard error units, rather than in number of digits. There doubtless are specialist applications where the 15th digit (or even the 10th!) are important. But the check of accuracy really has then to be tuned to that application. Results in cases where the "accuracy" beyond (optimistically) the sixth digit is of no consequence are unlikely to give much clue on performance in such specialist applications. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 23/03/2009, at 10:00 PM, r-help-requ...@r-project.org wrote: > From: Karl Ove Hufthammer > Date: 23 March 2009 2:58:23 AM > To: r-h...@stat.math.ethz.ch > Subject: Re: [R] Accuracy of R and other platforms > > > Alejandro C. Frery: > >> @ARTICLE{AlmironSilvaMM:2009, >> author = {Almiron, M. and Almeida, E. S. and Miranda, M.}, >> title = {The Reliability of Statistical Functions in Four Software >> Packages Freely used in Numerical Computation}, >> journal = {Brazilian Journal of Probability and Statistics}, >> year = {in press}, >> volume = {Special Issue on Statistical Image and Signal Processing}, >> url = {http://www.imstat.org/bjps/}} >> >> is freely available under the "Future Papers" link. It makes a nice >> comparison of the numerical properties of R, Ox, Octave and Python. > > Thanks for posting this. Im happy to see that the results for R were > generally excellent, and almost always better than for the three other > software packages. > > But there were a few cases where R did not turn out to be the winner. > Rather surprising that Ox was better than R for computing the > autocorrelation coefficient for two of the datasets, given its > terrible > results for the standard deviation. Anybody have any ideas why? > > -- > Karl Ove Hufthammer [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a noisy variable in regression (not an R question)
One can "just include it in the regression", but the potential problems for interpretation are surely greater than those indicated. Inclusion of X1 = T1+E1 may cause X2 to appear significant when in fact it is having no effect at all. Or the true effect can be reversed in sign. This happens because X1 and X2 are correlated. Maybe this is implicit in what Jon is saying. See Carroll, Ruppert and Stefanski: Measurement Error in Nonlinear Models (2004, pp.52-55). The error in E1 may need to be fairly large relative to SD(T1) for this to be an issue. My notes at http://www.maths.anu.edu.au/%7Ejohnm/r-book/2edn/xtras/xtras.pdf have brief comments, and code that can be used to illustrate the point. I support Stephen Kolassa's suggestions re using simulation for sensitivity analysis, though I think this can also be done analytically. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 08/03/2009, at 10:00 PM, r-help-requ...@r-project.org wrote: > From: Jonathan Baron > Date: 8 March 2009 5:21:55 AM > To: Juliet Hannah > Cc: r-help@r-project.org > Subject: Re: [R] using a noisy variable in regression (not an R > question) > > > If you form categories, you add even more error, specifically, the > variation in the distance between each number and the category > boundary. > > What's wrong with just including it in the regression? > > Yes, the measure X1 will account for less variance than the underlying > variable of real interest (T1, each individual's mean, perhaps), but > X1 could still be useful in two ways. One, it might be a significant > predictor of the dependent variable Y despite the error. Two, it > might increase the sensitivity of the model to other predictors (X2, > X3...) by accounting for what would otherwise be error. > > What you cannot conclude in this case (when you measure a predictor > with error) is that the effect of (say) X2 is not accounted for by its > correlation with T1. Some people try to conclude this when X2 remains > a significant predictor of Y when X1 is included in the model. The > trouble is that X1 is an error-prone measure of T1, so the full effect > of T1 is not removed by inclusion of X1. > > Jon > > On 03/07/09 12:49, Juliet Hannah wrote: >> Hi, This is not an R question, but I've seen opinions given on non R >> topics, so I wanted >> to give it a try. :) >> >> How would one treat a variable that was measured once, but is known >> to >> fluctuate a lot? >> For example, I want to include a hormone in my regression as an >> explanatory variable. However, this >> hormone varies in its levels throughout a day. Nevertheless, its >> levels differ >> substantially between individuals so that there is information >> there to use. >> >> One simple thing to try would be to form categories, but I assume >> there are better ways to handle this. Has anyone worked with such >> data, or could >> anyone suggest some keywords that may be helpful in searching for >> this >> topic. Thanks >> for your input. >> >> Regards, >> >> Juliet >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Jonathan Baron, Professor of Psychology, University of Pennsylvania > Home page: http://www.sas.upenn.edu/~baron > Editor: Judgment and Decision Making (http://journal.sjdm.org) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] The Origins of R
In another thread on this list, various wild allegations have been made, relating to the New York Times article on R. I object both to the subject line and to the content of several of the messages, and will not repeat or quote any of that content. It smacks to me of mischief making. Discussion has centered around the following quote from the NY Times article: “According to them, the notion of devising something like R sprang up during a hallway conversation. They both wanted technology better suited for their statistics students, who needed to analyze data and produce graphical models of the information. Most comparable software had been designed by computer scientists and proved hard to use.” The comment that "the notion of devising something like R sprang up during a hallway conversation" is strictly true. Certainly, this seems like a very plausible account. I'd have more difficulty believing that the notion was communicated to them in separate dreams. Part of the wanted technology was freedom for students to take the software home, or copy it down from the web. There was a further story to be told, about the origins of the language that Ross and Robert implemented and adapted. The NY writer pretty much left out that part of the story (S did get a mention, but its connection with R did not), but did remedy this omission in a follow-up. Nor did the article do much to acknowledge the workers and work that has gone into R's continuing development. Getting the attributions "right" is difficult. Even if "right" according to common conventions (and one can argue as to just what the conventions are, especially in the matter of computer language development), they are unlikely to be totally fair. Stigler's Law of Eponomy has wide sway! In the preface to the first and second edition of "Data Analysis and Graphics Using R", we have: "The R system implements a dialect of the S language that was developed at AT&T Bell Laboratories by Rick Becker, John Chambers and Allan Wilks". The only 1st edition attribution to Ihaka and Gentleman was in Chapter 12: "For citing R in a publication, use Ihaka and Gentleman (1996)". [NB: Type citation() to see the form of citation that should now be used.] That was as it now strikes me unfair to Ross and Robert, but no-one complained. Perhaps no-one ever read that far through the preface! There's an excellent brief summary of the history of R, and its connections with S, in Section 1.4 of John Chambers' "Software for Data Analysis".Appendix A has further details on the development of S, a kind of pre-history of R. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sweave style file
It would, I consider, be useful to note, on the help pages for Sweave and RweaveLatex, that the path to Sweave.sty in the R installation can be extracted as paste(R.home(),"share/texmf/", sep="/") or its equivalent. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R] adding a new dataset to the default R distribution
Making data, especially data that have been the subject of published papers, widely available, can be a useful spinoff from the R project, another gift to the scientific community beyond the provision of computing and analytic tools. Nowadays, in a complete publication of a scientific result, there is every reason for the data to be part of that publication. The Gentleman and Lang 2004 paper "Statistical Analyses and Reproducible Research" takes this further still, making a compelling case for opening the analysis to ready view. (http://www.bepress.com/bioconductor/paper2/ ) How else can critics know what analysis was done, and whether the data do really support the claimed conclusions? As I see it, the first recourse should be use of archives that individual communities may establish. Instructions on how to input the data into R would be a useful small item of ancillary information. Links to such archives (under Data Archives, maybe) might be included on CRAN. The Open Archaeology project would seem a good umbrella for the archiving of archaeology data. Where there is no available repository, or there are reasons for putting the data into an R package, one possibility is to advertise on this list: "Orphan dataset, looking for a good home". In this case I have offered to include the data in the DAAGxtras package, and I am open to further such requests. Perhaps however, there should be a "miscdata" or suchlike package to which such datasets can be submitted? All it would require is for someone to offer to act as Keeper of the Miscellaneous Data". John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 04/12/2008, at 10:00 PM, [EMAIL PROTECTED] wrote: > From: Stefano Costa <[EMAIL PROTECTED]> > Date: 3 December 2008 9:29:13 PM > To: r-help@r-project.org > Subject: [R] adding a new dataset to the default R distribution > > > Hi, > I am a student in archaeology with some interest in statistics and R. > > Recently I've obtained the permission to distribute in the public > domain > a small dataset (named "spearheads" with 40 obs. of 14 variables) > that > was used in an introductory statistics book for archaeologists > (published in 1994). > > I've rewritten most of the exercises of that book in R and made them > available at http://wiki.iosa.it/diggingnumbers:start along with the > original dataset, but I was wondering if there is a standard procedure > for getting a new dataset included in the default R distribution, like > "cars" and others. > > Please add me to CC if replying because I'm not subscribed to the > list. > > Best regards, > Stefano > > -- > Stefano Costa > http://www.iosa.it/ Open Archaeology [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Congratulations, Ross!
"Associate Professor Ihaka, of the Department of Statistics, was recognised with the Royal Society’s Pickering medal for his part in developing the statistical computing software R. R, created over 15 years ago by Dr Ihaka and colleague Robert Gentleman (now at the Fred Hutchinson Cancer Research Center, Seattle), is free software used by academics, industry and government worldwide to analyse numerical data and present it in graphical forms. Computer programmers globally have contributed to R, with over 1000 industry-specific add-ons created." from http://www.auckland.ac.nz/uoa/about/news/articles/2008/11/rsnz_honours.cfm 12 November 2008. John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R seven years ago
So what is the answer to the question: "Can success continue"? I suspect that R is now so firmly entrenched that it will inevitably continue, in one or other incarnation, for a long time to come. The negative factors that John Fox lists will surely, in time, make some changes inevitable. Will these come from force of circumstance rather than from conscious planning? In an August 12 message I posted details of R citation rates that I had gleaned, following a lead from Simon Blomberg, from Web of Science. This, or some such measure, seems to me important as giving a handle on the penetration of R into statistical application areas. The numbers I obtained [I&G = Ihaka & Gentleman 1996; RSTAT is the citation suggested by citation()] were: I&G: 1998=4, 1999=15, 2000=17, 2001=39, 2002=119, 2003=276 RSTAT+I&G: 2004:68+455 = 523 2005:433+512 = 945 2006:1049+426 = 1475 2007:1605+410 = 2015 2008, (to ~Aug10):1389+255 = 1644 cit <- c("1998" = 4, "1999" = 15, "2000" = 17, "2001" = 39, "2002" = 119, "2003" = 276, "2004" = 523,"2005" = 945,"2006" = 1475, "2007" = 2015, "2008"=1644) These will not be all that accurate; there will be omissions and duplications. Growth is close to exponential. John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 09/10/2008, at 9:00 PM, [EMAIL PROTECTED] wrote: > From: Peter Dalgaard <[EMAIL PROTECTED]> > Date: 9 October 2008 5:42:19 AM > To: [EMAIL PROTECTED] > Cc: "r-help@R-project.org" > Subject: Re: [R] R seven years ago > > > (Ted Harding) wrote: >> On 08-Oct-08 18:00:27, Liviu Andronic wrote: >> >>> Hello everyone, >>> >>> As some may know, today Google unveiled its 2001 search index [1]. I >>> was curious to see how was R like at that time, and was not >>> disappointed. Compared to today's main page [2], seven years ago the >>> page looked [3] a bit rudimentary, especially the graphic. (It is >>> wort >>> noting that structurally the pages are very similar.) What >>> definitely >>> changed is the `Contributed packages' section. Then R featured 29 >>> contributed packages [4], while now it features 1500+ [5]. It was >>> surprising to realize the growth of R during the past seven years. >>> >>> Regards, >>> Liviu >>> >>> [1] http://www.google.com/search2001.html >>> [2] http://www.r-project.org/ >>> [3] http://web.archive.org/web/20010722202756/www.r-project.org/ >>> [4] >>> http://web.archive.org/web/20010525004023/cran.r-project.org/bin/macos/c >>> ontrib/src/ >>> [5] http://cran.at.r-project.org/web/packages/ >>> >> >> Many thanks for this, Liviu! One might also compare the mailing list >> usage: >> >> [R-help 1997]: 484 messages >> [R-help 2001]: 4309 messages >> [R-help 2007]: 26250 >> 1721+1909+2196+2145+2210+2309+ >> 2142+2246+2028+2711+2602+2031 >> >> So we now get more posts in a week than we did in the whole of 1997! >> >> > Those not present at the useR in Dortmund might want to skim John > Fox's talk > > http://www.statistik.uni-dortmund.de/useR-2008/slides/Fox.pdf > > (Actually, he did something at the end to avoid ending on a negative > note. Flipped back to one of the increasing graphs, I suppose.) > > -- > O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) > 35327918 > ~~ - ([EMAIL PROTECTED]) FAX: (+45) > 35327907 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R Citation rates
Following some discussion with Simon Blomberg, I've done a few Web of Science citation searches. A topic search for R LANG* STAT* seems to turn up most of the references to "R: A Language and Environment for Statistical Computing" "R Development Core Team" gets transformed into an astonishing variety of variations. Searching for citations of the 1996 Ihaka and Gentleman paper (most references up to and including 2004) turns up many fewer quirks. What other forms of reference should be investigated? Anyway, here are the numbers by year (there may a some duplication. 1998: I&G: 4 15 17 39 119 276 2004: RSTAT+I&G: 68+455 433+512 1049+426 1605+410 1389+255 523 945 1475 2015 1644 cit <- c("1998" = 4, "1999" = 15, "2000" = 17, "2001" = 39, "2002" = 119, + "2003" = 276,"2004" = 523,"2005" = 945,"2006" = 1475, "2007" = 2015, + "2008"=1644) [~4550 references to R LANG* STAT*; ~2530 to I&G) On a rate per year basis, the 2008 figure scales up to 2691. This does not however allow for growth over the course of the year. The number of references grew by 37% from 2006 to 2007. On current trends, the 2007-2008 increase seems likely to be much larger than that. The figures probably underestimate the contribution from Bioconductor related work. A direct search for Bioconductor-related papers did not turn however up enough papers to make too much difference to the numbers. Here are some other summary figures, for graphing using whatever form of presentation appeals most (the second number is for the I&G paper) country <- c(usa=1540+903, germany=539+304, england=507+328, france=468+337, canada=345+147, australia=329+169, switzerland=279+121) subj <- c(ecology=924+349, statsANDprob=488+270, geneticsANDheredity=488+279, envScience=298+119, CSapplicatiions=269+108, zoology=267+111, plantSciences=250+108, biochemANDmolbio=229+200, mathANDcompBIO=224+143, biotechANDappliedmicrobiology=223+159, evolutionaryBIO=210+117) There's a great deal more summary information that might be extracted. What is a good way, with readily available data, to standardize the country data. Environmental Science no doubt comes up tops because it is a coarser grouping than many other areas. John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dividing by 0
I suspect you should be smoothing the series in a manner that replaces zeros by some usually small larger number before you start. Without more details on what you are trying to do, it is impossible to know what is sensible. You are proposing to leave all smoothing ("rolling"?) till later; why not do some smoothing at the start? John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 26 Jul 2008, at 8:00 PM, [EMAIL PROTECTED] wrote: From: nmarti <[EMAIL PROTECTED]> Date: 26 July 2008 1:42:09 AM To: r-help@r-project.org Subject: Re: [R] Dividing by 0 I'm well aware these are not errors, I guess I miss-wrote. I understand your concern. Thanks for passionately looking out for my well being, you saved my life. My variable has about 10,000 elements and sometime for the first 100 to 500 elements there is lots of 0's, so I end up with lots of NA/NaN/Inf's. However, when I try to use "Rolling" calculations I recieve error messages because the "Rolling" functions reject the NA/NaN/Inf's. So, I need 0's in place of the NA/NaN/Inf's so I can run the "Rolling" calculations. I can't just delete these observations, because it messes up lots of other other things within these dataframes. I'm well aware these "Rolling" calculations will be wrong in the beginning of the dataframe, so I just throw these out. The rolling window is only about 50 odservations, so out of 10,000, I still end up with ample correct data and calculations. So is this still idiotic? Thanks again for your concern. Now that you understand my situation a little better, you might be less distracted today and be able to sleep better tonight. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-ME] lme nesting/interaction advice
I've been looking back over this discussion. Another model that one can fit using lme is: > lme(score~Machine, random=list(Worker=pdIdent(~0+Machine)), +weights=varIdent(form=~1|Machine), data=Machines) Linear mixed-effects model fit by REML Data: Machines Log-restricted-likelihood: -108.9547 Fixed: score ~ Machine (Intercept)MachineBMachineC 52.367.97 13.916667 Random effects: Formula: ~0 + Machine | Worker Structure: Multiple of an Identity MachineA MachineB MachineC Residual StdDev: 6.06209 6.06209 6.06209 1.148581 Variance function: Structure: Different standard deviations per stratum Formula: ~1 | Machine Parameter estimates: A B C 1.000 0.8713263 0.5859709 Number of Observations: 54 Number of Groups: 6 This insists (I think) that conditional on the random effect for the worker, the worker/machine random effects be independent, but allows them to have different variances. I am wondering whether it is possible to fit such a model using lmer(). [In this example the large estimated correlations suggest that it is not a sensible model.] John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 14 May 2008, at 2:49 AM, Douglas Bates wrote: On Mon, May 12, 2008 at 5:39 PM, Rolf Turner <[EMAIL PROTECTED]> wrote: On 13/05/2008, at 4:09 AM, Douglas Bates wrote: I'm entering this discussion late so I may be discussing issues that have already been addressed. As I understand it, Federico, you began by describing a model for data in which two factors have a fixed set of levels and one factor has an extensible, or "random", set of levels and you wanted to fit a model that you described as y ~ effect1 * effect2 * effect3 The problem is that this specification is not complete. At *last* (as Owl said to Rabbit) we're getting somewhere!!! I always knew that there was some basic fundamental point about this business that I (and I believe many others) were simply missing. But I could not for the life of me get anyone to explain to me what that point was. Or to put it another way, I was never able to frame a question that would illuminate just what it was that I wasn't getting. I now may be at a stage where I can start asking the right questions. An interaction of factors with fixed levels and a factor with random levels can mean, in the lmer specification, lmer(y ~ effect1 * effect2 + (1| effect3) + (1| effect1:effect2:effect3), ...) or lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...) or other variations. When you specify a random effect or an random interaction term you must, either explicitly or implicitly, specify the form of the variance-covariance matrix associated with those random effects. The "advantage" that other software may provide for you is that it chooses the model for you but that, of course, means that you only have the one choice. Now may I start asking what I hope are questions that will lift the fog a bit? Let us for specificity consider a three-way model with two fixed effects and one random effect from the good old Rothamstead style agricultural experiment context: Suppose we have a number of species/breeds of wheat (say) and a number of fertilizers. These are fixed effects. And we have a number of fields (blocks) --- a random effect. Each breed-fertilizer combination is applied a number of times in each field. We ***assume*** that that the field or block effect is homogeneous throughout. This may or may not be a ``good'' assumption, but it's not completely ridiculous and would often be made in practice. And probably *was* made at Rothamstead. The response would be something like yield in bushels per acre. The way that I would write the ``full'' model for this setting, in mathematical style is: Y_ijkl = mu + alpha_i + beta_j + (alpha.beta)_ij + C_k + (alpha.C)_ik + (beta.C)_jk + (alpha.beta.C)_ijk + E_ijkl The alpha_i and beta_j are parameters corresponding to breed and fertilizer respectively; the C_k are random effects corresponding to fields or blocks. Any effect ``involving'' C is also random. The assumptions made by the Package-Which-Must-Not-Be-Named are (I think) that C_k ~ N(0,sigma_C^2) (alpha.C)_ik ~ N(0,sigma_aC^2) (beta.C)jk ~ N(0,sigma_bC^2) (alpha.beta.C)_ijk ~ N(0,sigma_abC^2) E_ijkl ~ N(0,sigma^2) and these random variables are *all independent*. A ... perhaps I'm on the wa
[R] predict lmer
The following function is designed to work with a logit link. It can easily be generalized to work with any link. The SEs and CIs are evaluated accounting for all sources of random variation. The plot may not be much help unless there is just one explanatory variate. `ciplot` <- function(obj=glmm2, data=data.site, xcol=2, nam="litter"){ cilim <- function(obj, xcol){ b <- fixef(obj) vcov <- summary(obj)@vcov X <- unique(model.matrix(obj)) hat <- X%*%b pval <- exp(hat)/(1+exp(hat)) # NB, designed for logit link U <- chol(as.matrix(summary(obj)@vcov)) se <- sqrt(apply(X%*%t(U), 1, function(x)sum(x^2))) list(hat=hat, se=se, x=X[,xcol]) } limfo <- cilim(obj, xcol) hat <- limfo$hat se <- limfo$se x <- limfo$x upper <- hat+2*se lower <- hat-2*se ord <- order(x) plot(x, hat, yaxt="n", type="l", xlab=nam, ylab="") rug(x) lines(x[ord], lower[ord]) lines(x[ord], upper[ord]) ploc <- c(0.01, 0.05, 0.1, 0.2, 0.5, 0.8, 0.9) axis(2, at=log(ploc/(1-ploc)), labels=paste(ploc), las=2) } ## Usage glmm2 <- lmer(rcr ~ litter + (1 | Farm), family=binomial, data=data.site) ciplot(obj=glmm2) John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 8 May 2008, at 8:00 PM, [EMAIL PROTECTED] wrote: > From: "May, Roel" <[EMAIL PROTECTED]> > Date: 8 May 2008 12:23:15 AM > To: r-help@r-project.org > Subject: [R] predict lmer > > > Hi, > > I am using lmer to analyze habitat selection in wolverines using the > following model: > > (me.fit.of <- > lmer(USED~1+STEP+ALT+ALT2+relM+relM:ALT+(1|ID)+(1| > ID:TRKPT2),data=vdata, > control=list(usePQL=TRUE),family=poisson,method="Laplace")) > > Here, the habitat selection is calaculated using a so-called discrete > choice model where each used location has a certain number of > alternatives which the animal could have chosen. These sets of > locations > are captured using the TRKPT2 random grouping. However, these sets are > also clustered over the different individuals (ID). USED is my binary > dependent variable which is 1 for used locations and zero for unused > locations. The other are my predictors. > > I would like to predict the model fit at different values of the > predictors, but does anyone know whether it is possible to do this? I > have looked around at the R-sites and in help but it seems that there > doesn't exist a predict function for lmer??? > > I hope someone can help me with this; point me to the right > functions or > tell me to just forget it > > Thanks in advance! > > Cheers Roel > > Roel May > Norwegian Institute for Nature Research > Tungasletta 2, NO-7089 Trondheim, Norway [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] REvolution
Does anyone know any more than is in the following press release about REvolution Computing and their commercialization of R? http://www.intel.com/capital/news/releases/080122.htm "Intel Capital, the global investment arm of Intel Corporation, today announced that it has invested in the Series A financing of REvolution Computing, creator of parallel computing software for computational statistics" "R has become the de-facto statistical language and REvolution Computing's R-based solutions provide the necessary scale and support for its growing commercial usage,” said Andre M. Boisvert, Revolution Computing board member. “With this investment, Revolution Computing can deliver the type of performance that has been missing in existing computational statistics offerings." "For more information on Revolution Computing, RPro, and ParallelR, visit www.revolution-computing.com" At present though, unless you have better success than me, all you will get from the Revolution Computing site is a revolutionary song! John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Symbolic substitution in parallel; use infinity symbol?
Actually this works beautifully as it stands: breaks <- c(-Inf, -1, 1, Inf) zz <- lapply(breaks, function(x) if(x==-Inf) quote(-infinity) else if (x==Inf) quote(infinity) else format(x)) lbl <- mapply(function(x,y) bquote("(" * .(x) * "," * .(y) * "]"), zz[-length(zz)], zz[-1], SIMPLIFY=FALSE) y <- rnorm(40) x0 <- cut(y, breaks=breaks) plot(x0, y, xaxt="n") axis(1, at=1:3, labels=as.expression(lbl)) Thanks, Peter. John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 31 Dec 2007, at 12:35 PM, Peter Dalgaard wrote: > John Maindonald wrote: >> I'd like to be able to modify axlab in (C) below so that 'Inf' >> is replaced by the infinity symbol. >> >> y <- rnorm(40) >> breaks <- c(-Inf, -1, 1, Inf) >> x <- cut(y, breaks=breaks) >> plot(unclass(x), y, xaxt="n", xlab="") >> >> ## A: The following gives the axis labels "(-Inf, 1]", etc. >> axis(1, at=1:3, labels=expression("(-Inf,-1]", "(-1,1]", "(1, Inf]")) >> >> ## B: The following replaces Inf by the infinity symbol >> axis(1, at=1:3, labels=expression("(" * -infinity * ", " * -1 * "]", >> "(" * -1 * ", 1]", "(1, " * >> infinity * "]"), >>line=1.25, lty=0) >> >> ## C: The following gives the axis labels "(-Inf, 1]", etc., >> ## in a more automated manner. >> axlab <- lapply(levels(x), function(x)substitute(a, list(a=x))) >> # Can alternatively use bquote() >> axis(1, at=1:3, labels=as.expression(axlab), line=2.25, lty=0) >> >> However I have been unable to modify axlab so that the infinity >> symbol appears in place of 'Inf'. Is there is some relatively >> straightforward way to do this? The issue is of course more >> general than this specific example. >> > Here's an idea, leaving some tinkering for you: > > breaks <- c(-Inf, -1, 1, Inf) > > zz <- lapply(breaks, function(x) if(x==-Inf) quote(-infinity) > else > if (x==Inf) quote(infinity) else format(x)) > > lbl <- mapply(function(x,y) >bquote("(" * .(x) * "," * .(y) * "]"), > zz[-4], zz[-1], SIMPLIFY=FALSE) > > > -- > O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) > 35327918 > ~~ - ([EMAIL PROTECTED]) FAX: (+45) > 35327907 > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Symbolic substitution in parallel; use infinity symbol?
I'd like to be able to modify axlab in (C) below so that 'Inf' is replaced by the infinity symbol. y <- rnorm(40) breaks <- c(-Inf, -1, 1, Inf) x <- cut(y, breaks=breaks) plot(unclass(x), y, xaxt="n", xlab="") ## A: The following gives the axis labels "(-Inf, 1]", etc. axis(1, at=1:3, labels=expression("(-Inf,-1]", "(-1,1]", "(1, Inf]")) ## B: The following replaces Inf by the infinity symbol axis(1, at=1:3, labels=expression("(" * -infinity * ", " * -1 * "]", "(" * -1 * ", 1]", "(1, " * infinity * "]"), line=1.25, lty=0) ## C: The following gives the axis labels "(-Inf, 1]", etc., ## in a more automated manner. axlab <- lapply(levels(x), function(x)substitute(a, list(a=x))) # Can alternatively use bquote() axis(1, at=1:3, labels=as.expression(axlab), line=2.25, lty=0) However I have been unable to modify axlab so that the infinity symbol appears in place of 'Inf'. Is there is some relatively straightforward way to do this? The issue is of course more general than this specific example. John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reminiscing on 20 years using S
My first exposure to S was on an AT&T 3B2 (a 3B2/100, I think), at the Auckland (Mt Albert) Applied Mathematics Division Station of the NZ Dept of Scientific and Industrial Research. The AMD Head Office in Wellington had one also. There may have been one or more others; I cannot remember. This would have been in 1983, maybe. It was a superbly engineered machine, but the sofware (System V, version 3.2) had its problems. If you back deleted too far along the command line, something unpleasant (losing the line? or worse?) happened. On typing 1+1 at the S command line, it took a second to get an answer. John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 27 Dec 2007, at 10:00 PM, [EMAIL PROTECTED] wrote: > From: roger koenker <[EMAIL PROTECTED]> > Date: 27 December 2007 9:56:45 AM > To: Greg Snow <[EMAIL PROTECTED]> > Cc: R-help list <[EMAIL PROTECTED]> > Subject: Re: [R] Reminiscing on 20 years using S > > On Dec 26, 2007, at 2:05 PM, Greg Snow wrote: > >> I realized earlier this year (2007) that it was in 1987 that I first >> started using an early version of S (it was ported to VMS and was >> called >> success). That means that I have been using some variant of S (to >> various degrees) for over 20 years now (I don't feel that old). > > Boxing day somehow seems appropriate for this thread. R.I.P. to all > those old boxes > of yesteryore and the software that ran on them -- and yet there is > always a residual archaeological curiosity. > > I discovered recently that the MIT athena network contains a circa > 1989 version > of S: http://stuff.mit.edu/afs/athena/astaff/project/Sdev/S/ which > made me wonder > whether there was any likelihood that one could recreate "S Thu Dec > 7 16:49:47 EST 1989". > Curiosity is one thing, time to dig through the layers of ancient > civilizations is quite another. > But if anyone would like to offer a (preferably educated) guess > about the feasibility of such a project, like I said, I would be > curious. > > > url:www.econ.uiuc.edu/~rogerRoger Koenker > email [EMAIL PROTECTED] Department of > Economics > vox:217-333-4558University of Illinois > fax:217-244-6678Champaign, IL 61820 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Packages - a great resource, but hard to find the right one
R as a whole could benefit from systematic attention. There may be scope for several special issues of JSS. - R overview and philosophy - R penetration and influence, in the statistics community, in machine learning, and in a variety of application areas. - R as a vehicle for fostering communication between researchers in diverse areas. [A great thing about R-help, though nowadays this role is passing across to other lists such as R-sig-ME, is that it facilitates and even forces communication between application area specialists, and between those specialists and statistics professionals. This may be temporary; we may see the R community fragment into diverse communities that focus on their own specialist interests? Scope for a sociological study, perhaps?) - "Who is using R?", as reflected in published scientific literature. (I'd like to see a wiki or somesuch where authors are encouraged to give details of published analyses that have used R.) - Where is R headed? How will the shaping of its direction proceed? Will it be a matter of step by step change and improvement, or is it (will it be) possible to lay out in advance an outline of the directions that its future development can be expected to take. - Traps for new (and old) users. - Books and papers on R. - Then onto packages! I guess what may be in order is something like the expansion of a task view into an extended paper. John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 23 Nov 2007, at 10:00 PM, [EMAIL PROTECTED] wrote: > From: Antony Unwin <[EMAIL PROTECTED]> > Date: 23 November 2007 7:50:52 PM > To: [EMAIL PROTECTED] > Subject: Re: [R] Packages - a great resource, but hard to find the > right one > > > Johannes Hüsing wrote > >>> Above all there are lots of packages. As the software editor of the >>> Journal of Statistical Software I suggested we should review R >>> packages. >> >> You mean: prior to submission? > > No. > >>> No one has shown any enthusiasm for this suggestion, but I >>> think it would help. Any volunteers? >> >> Thing is, I may like to volunteer, but not in the "here's a >> package for you to review by week 32" way. Rather in the way that >> I search a package which fits my problem. > > That's what I was hoping for. > >> One package lets me down >> and I'd like to know other users and the maintainer about it. >> The other one works black magic and I'd like to drop a raving >> review about it. This needs an infrastructure with a low barrier >> to entry. A wiki is not the worst idea if the initial infrastructure >> is geared at addressing problems rather than packages. > > We should differentiate between rave reviews of features that just > happened to be very useful to someone and reviews of a package as a > whole. Both have their place and at the moment we don't have either. > > If you are willing to review an R package or aspects of R for JSS > please let me know. > > Antony Unwin > Professor of Computer-Oriented Statistics and Data Analysis, > Mathematics Institute, > University of Augsburg, > Germany [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-ME] mixed model testing
Whether or not you need a mixed model, e.g. random versus fixed slopes, depends on how you intend to use results. Suppose you have lines of depression vs lawn roller weight calculated for a number of lawns. If the data will always be used to make predictions for one of those same lawns, a fixed slopes model is fine. If you want to use the data to make a prediction for another lawn from the same "population" (the population from which this lawn is a random sample, right?), you need to model the slope as a random effect. Now for a more subtle point: In the prediction for another lawn situation, it is possible that the slope random effect can be zero, and analysts do very commonly make this sort of assumption, maybe without realizing that this is what they are doing. You can test whether the slope random effect is zero but, especially if you have data from a few lawns only, failure to reject the null (zero random effect) is not a secure basis for inferences that assume that the slope is indeed zero. The "test for zero random effect, then infer" is open to Box's pithy objection that "... to make preliminary tests on variances is rather like putting to sea in a rowing boat to find out whether conditions are sufficiently calm for an ocean liner to leave port". John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 8 Nov 2007, at 1:55 AM, Irene Mantzouni wrote: > Is there a formal way to prove the need of a mixed model, apart from > e.g. comparing the intervals estimated by lmList fit? > For example, should I compare (with AIC ML?) a model with seperately > (unpooled) estimated fixed slopes (i.e.using an index for each > group) with a model that treats this parameter as a random effect > (both models treat the remaining parameters as random)? > > Thank you! > > ___ > [EMAIL PROTECTED] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] postscript(), used from a pre R-2.6.0 workspace
I find that if start R-2.6.0 in a workspace with no .RData file, load one of my workspaces from R-2.5.1 or earlier into R-2.6.0, and then before doing anything else type postscript(file="z.ps"), I get:: > ls() character(0) > load(url("http://www.maths.anu.edu.au/~johnm/r/test5.RData";)) > postscript(file="z.ps") Error in postscript(file = "z.ps") : Invalid font type In addition: Warning messages: 1: closing unused connection 3 (gzcon(http://www.maths.anu.edu.au/~johnm/r/test5.RData) ) 2: In postscript(file = "z.ps") : font family not found in PostScript font database 3: In postscript(file = "z.ps") : font family not found in PostScript font database Or R may be started in a workspace with such a .RData file, or with a .RData file from R-2.5.1 or earlier. This behavior persists even if I take such an image file, load it, clear the workspace, and save it. An instance of such a .RData file has the url: http://www.maths.anu.edu.au/~johnm//r/test5.RData It makes no difference whether I load the file into R-2.6.0 on a MacOS X or Windows system, or whether I use the current R-patched (2007-10-27 r43288). I do not know whether the source of the original .RData file matters, or how it may depend on what I have done in that earlier workspace. For moving my .RData files across without messing up use of postscript() [or pdf()], a workaround is to dump the contents of the earlier workspace, then start R in a working directory whose .RData file, if any, has the proper R-2.6.0 pedigree, and source the dumped file. I do not want to have to directly run the function that creates the database each time I start a session. Has anyone else hit this? John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.