Re: [R] R badly lags matlab on performance?
I wrote once the benchmark mentioned in Stefan's post (based on initial work by Stephan Steinhaus), and it is still available for those who would like to update it. Note that it is lacking some checking of the results to make sure that calculation is not only faster, but correct! Now, I'll tell why I haven't update it, and you'll see it is connected with the current topic. First, lack of time, for sure. Second, this benchmark has always been very criticized by several people including from the R Core Team. Basically, this is just toy examples, disconnected from the reality. Even with better cases, benchmarks do not take into account the time needed to write your code for your particular application (from the question to the results). I wrote this benchmark at a time when I overemphasized on the pure performances of the software, at a time I was looking for the best software I would choose as a tool for my future career. Now, what's my choice, ten years later? Not two, not three software... but just ONE: R. I tend to do 95% of my calculations with R (the rest is ImageJ/Java). Indeed, this benchmark results (and the toy example of Ajay Shah, a - a + 1) should be only considered very marginally, because what is important is how your software tool is performing in real application, not in simplistic toy examples. R lays behind Matlab for pure arithmetic calculation... right! But R has a better object oriented approach, features more variable types (factor, for instance), and has a richer mechanism for metadata handling (col/row names, various other attributes, ...) that makes it richer to instanciate complex datasets or analyzes than Matlab. Of course, this has a small cost in performance. As soon as you think your problem in a vectorized way, R is one of the best tool, I think, to go from the question to the answer in real situations. How could we quantify this? I would only see big contests where experts of each language would be presented real problems and one would measure the time needed to solve the problem,... Also, one should measure: the robustness, reusability, flexibility, elegance of the code produced (how to quantify these?). Such kind of contest between R, Matlab, Octave, Scilab, etc. is very unlikely to happen. At the end, it is really a matter of personal feeling: you can make your own little contest by yourself: trying to solve a given problem in several software... and then decide which one you prefer. I think many people do/did this, and the still exponential growth of R use (at least, as it can be observed by the increasing number of CRAN R packages) is probably a good sign that R is probably one of the top performers when it comes to efficiency from the question to the answer in real problems, not just on toy little examples! (sorry for been so long, I think I miss some interaction with the R community this time ;-) Best, Philippe ..°})) ) ) ) ) ) ( ( ( ( (Prof. Philippe Grosjean ) ) ) ) ) ( ( ( ( (Numerical Ecology of Aquatic Systems ) ) ) ) ) Mons-Hainaut University, Belgium ( ( ( ( ( .. Stefan Grosse wrote: I don't have octave (on the same machine) to compare these with. And I don't have MatLab at all. So I can't provide a comparison on that front, I'm afraid. Ted. Just to add some timings, I was running 1000 repetitions (adding up to a=1001) on a notebook with core 2 duo T7200 R 2.8.1 on Fedora 10: mean 0.10967, st.dev 0.005238 R 2.8.1 on Windows Vista: mean 0.13245, st.dev 0.00943 Octave 3.0.3 on Fedora 10: mean 0.097276, st.dev 0.0041296 Matlab 2008b on Windows Vista: 0.0626 st.dev 0.005 But I am not sure how representative this is with that very simple example. To compare Matlab speed with R a kind of benchmark suite is necessary. Like: http://www.sciviews.org/benchmark/index.html but that one is very old. I would guess that there did not change much: sometimes R is faster, sometimes not. This difference between the Windows and Linux timing is probably not really relevant: when I was comparing the timings of my usual analysis there was no difference between the two operating systems. (count data and time series stuff) Cheers Stefan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Customized LDA (MASS) object plot
Hi R experts. I performed an Linear Discriminant Analysis (lda) and now I want to plot the first two axes (LDA1 and LDA2). Well MASS package have the plot.lda and pairs.lda to do that. But, they dont let me personalize them, once they dont accept the type=n plot. So I start looking at the lda object properties, trying to found out my groups (n=4) axis coordinates, curiously I couldnt. There is only the variables axis coordinates and some analysis results. If I can reach this information, Ill be able to create a customized graph using the normal plot(), points() and par(). So the question is: How could I reach these axis coordinates inside my object (supposing they are there somewhere, once the plot.lda and pairs.lda are able to plot them)? Thanks in advance. ___ MSc. mailto:r.alui...@gmail.com Rodrigo Aluizio Centro de Estudos do Mar/UFPR Laboratório de Micropaleontologia Avenida Beira Mar s/n - CEP 83255-000 Pontal do Paraná - PR - BRASIL Fone: (0**41) 3455-1496 ramal 217 Fax: (0**41) 3455-1105 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extend summary.lm for hccm?
Dear John (and other readers of this mailing list), thanks for your help. It now raises two further questions, one directly related to R and probably easy to answer, the other one a little off-topic. John Fox wrote: ... (BTW, one would not normally call summary.lm() directly, but rather use the generic summary() function instead.) ... Is there any difference for R between using summary.lm() and using summary()? Or is it just that in the second case, R recognizes that the input is lm and then calls summary.lm()? That said, it's hard for me to understand why it's interesting to have standard errors for the individual coefficients of a high-degree polynomial, and I'd also be concerned about the sensibleness of fitting a fifth-degree polynomial in the first place. I am trying to estimate some Engel curves - functions of the relationship between income of a household and the demand share of certain goods. As I want to estimate them for only one good, the only restriction that arises from Gorman (1981) seems to be that in a pure Engel curve model (including only income and the demand share) the income share should be the sum of some multiplications of polynomials of the natural logarithm of the income. I have not yet found a theoretical reason for a limit to the number of polynomials and I know to little maths to say if it's impossible to estimate the influence of x^5 if you've already included x to x^4. So I thought I might just compare different models with different numbers of polynomials using information criteria like Amemiya's Prediction Criterion. I guess using x^1 to x^5 it will be hardly possible to estimate the influence of a single one of these five polynomials as each one of them could be approximated using the other four, but where to draw the line? So if anybody could tell me where to read how many polynomials to include at most, I'd be grateful. Regards, Achim Gorman (1981) is: Gorman, W. M. (1981), Some Engel Curves, in Essays in the Theory and Measurement of Consumer Behaviour in Honor of Sir Richard Stone __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Combining greek letters with the contents of variables
Dear r-help list, I am trying to combine a greek letter lambda with the contents of a variable v in the title of a plot. However, it seems that having v inside the expression() function causes it not to be evaluated, on the other hand having expression(lambda) inside something else like paste causes it to be evaluated to a string. Here is an example of what I want to do: title(main=expression(Value of *Lambda*paste( = ,v,sep=))) Is there any solution for this? Cheers, Ingeborg Schmidt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems with Rcmdr and BiodiversityR
Dear all, I run R 2.7.2 under Windows and integrated BiodiversityR sucessfully into the R commander. Most functions of BiodiversityR run but others (like analysis of species as response) produce blank windows and the message: Error in get(x, envir = RcmdrEnv(), mode = mode, inherits = FALSE) : Variable operatorFont nicht gefunden What's wrong? Best regards, Frieda -- View this message in context: http://www.nabble.com/Problems-with-Rcmdr-and-BiodiversityR-tp21277781p21277781.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combining greek letters with the contents of variables
I feel this becomes a frequently asked question, hence I tried Google and typed R main expression greek --- and got an answer: https://stat.ethz.ch/pipermail/r-help/2006-July/109934.html Uwe Ligges Ingeborg Schmidt wrote: Dear r-help list, I am trying to combine a greek letter lambda with the contents of a variable v in the title of a plot. However, it seems that having v inside the expression() function causes it not to be evaluated, on the other hand having expression(lambda) inside something else like paste causes it to be evaluated to a string. Here is an example of what I want to do: title(main=expression(Value of *Lambda*paste( = ,v,sep=))) Is there any solution for this? Cheers, Ingeborg Schmidt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with Rcmdr and BiodiversityR
Dear Frieda, I'm afraid that it's not possible to tell from the information that you've given what the source of the problem is. What version of Rcmdr are you using? Have you written an Rcmdr plug-in package for BiodiversityR? Where in your code is this error produced? Etc. My guess is that there's a version conflict, since the current version of the Rcmdr package on CRAN (1.4-6) doesn't use the variable operatorFont, which was previously employed to render the various operator buttons (+, *, etc.) used in the formula fields of statistical-modeling dialogs. (Now the standard font is used on the buttons.) This variable was (and is) used nowhere else in the Rcmdr package. The error was probably produced by the command getRcmdr(operatorFont); why that should be in your code, I can't say. Although I don't know the specific source of the error, I recommend that you start by updating R, the Rcmdr package, and all other packages to their current versions. Your plug-in should call the current versions of Rcmdr utility functions, such as modelFormula(). I hope this helps, John On Sun, 4 Jan 2009 07:59:27 -0800 (PST) Frieda friederike.gruenin...@uni-passau.de wrote: Dear all, I run R 2.7.2 under Windows and integrated BiodiversityR sucessfully into the R commander. Most functions of BiodiversityR run but others (like analysis of species as response) produce blank windows and the message: Error in get(x, envir = RcmdrEnv(), mode = mode, inherits = FALSE) : Variable operatorFont nicht gefunden What's wrong? Best regards, Frieda -- View this message in context: http://www.nabble.com/Problems-with-Rcmdr-and-BiodiversityR-tp21277781p21277781.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combining greek letters with the contents of variables
The paste in that answer could be eliminated using ~ : plot(1:10, main= bquote(Results for ~ pi == .(pi))) On Sun, Jan 4, 2009 at 11:17 AM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: I feel this becomes a frequently asked question, hence I tried Google and typed R main expression greek --- and got an answer: https://stat.ethz.ch/pipermail/r-help/2006-July/109934.html Uwe Ligges Ingeborg Schmidt wrote: Dear r-help list, I am trying to combine a greek letter lambda with the contents of a variable v in the title of a plot. However, it seems that having v inside the expression() function causes it not to be evaluated, on the other hand having expression(lambda) inside something else like paste causes it to be evaluated to a string. Here is an example of what I want to do: title(main=expression(Value of *Lambda*paste( = ,v,sep=))) Is there any solution for this? Cheers, Ingeborg Schmidt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lattice xyplot help please.
Hi - I am not R expert and I would appreciate your time if you can help me about my xyplot question. I would like to add text (p-value) in a 4 panels xyplot. I thought panel = function{} should work but I am not sure where I did it wrong. The error message from the following code is Argument subscripts is missing with no default values xyplot(GLG ~ PD | factor(TRT) , groups = GLG_ind,strip = strip.custom(style =4),ref = T, as.table=TRUE,data = splitPD, subscripts = TRUE,cex = 2, panel = function(x, y, pvalue,subscripts,...){ panel.xyplot(x,y,...); panel.abline(h = 51.95); grid.text(paste(p-value =, pvalue[subscripts]), .25, .15, gp = gpar(cex = .8))} ) I really appreciate your time to help me. Best, Haoda Appendix - Data pvalue [1] 0.88313298 0.02224550 0.8000 0.12663046 splitPD PD TRT GLG GLG_ind 1 -8 30 38.5 0 2 -81 30 58.6 1 4 -33 30 35.0 0 5 -18 30 41.1 0 6 -45 90 64.3 1 8 -39 90 41.9 0 9 -45 90 56.2 1 10 -98 90 53.6 1 11 27 90 46.4 0 12 -45 90 74.2 1 15 -22 5 56.4 1 16 -25 5 63.8 1 17 4 5 50.2 0 18 -52 30 64.6 1 21 -31 60 44.5 0 22 -36 5 42.1 0 23 -56 5 37.8 0 24 -5 5 31.3 0 26 -29 5 31.7 0 27 -9 5 39.0 0 28 -9 5 26.7 0 31 -41 30 52.7 1 32 -24 30 50.4 0 33 -18 30 32.4 0 35 -36 30 41.3 0 36 -22 30 41.1 0 37 -36 90 42.5 0 39 -18 90 63.9 1 40 -25 60 40.6 0 42 -43 60 86.4 1 43 -58 60 48.1 0 44 -16 60 48.5 0 45 -26 60 59.2 1 Code works rm(list=ls()); library(lattice); library(grid); library(rpart); xyplot(GLG ~ PD | factor(TRT) , groups = GLG_ind,strip = strip.custom(style =4),ref = T, as.table=TRUE,data = splitPD, cex = 2, panel = function(x, y,...){panel.xyplot(x,y,...); panel.abline(h = 51.95); }) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with Rcmdr and BiodiversityR
Dear Frieda, I'm afraid that I completely misunderstood your question. I was unfamiliar with the BiodiversityR package and thought that you were writing an Rcmdr plug-in for it. Actually, I see that BiodiversityR uses the Rcmdr interface, but isn't written as a standard plug-in. Instead, it apparently manipulates the Rcmdr menu file directly. This is inadvisable, and I suspect that package is simply incompatible with the current version of the Rcmdr package. I suggest that you contact the package maintainer, who might choose to rewrite the package as a plug-in. Regards, John On Sun, 4 Jan 2009 08:55:35 -0800 (PST) Frieda friederike.gruenin...@uni-passau.de wrote: Dear John, thanks for the quick answer - I'm new to R and sometimes a bit lost in the jungle... - version conflict seems to be the problem. I use the recent Rcmdr Version but took the plug-in package for BiodiversityR from the authors page, which seems to be not updated (http://www.worldagroforestry.org/treesandmarkets/tree_diversity_analysis.asp). I'll find a working plug-in! Thanks again Frieda John Fox-6 wrote: Dear Frieda, I'm afraid that it's not possible to tell from the information that you've given what the source of the problem is. What version of Rcmdr are you using? Have you written an Rcmdr plug-in package for BiodiversityR? Where in your code is this error produced? Etc. My guess is that there's a version conflict, since the current version of the Rcmdr package on CRAN (1.4-6) doesn't use the variable operatorFont, which was previously employed to render the various operator buttons (+, *, etc.) used in the formula fields of statistical-modeling dialogs. (Now the standard font is used on the buttons.) This variable was (and is) used nowhere else in the Rcmdr package. The error was probably produced by the command getRcmdr(operatorFont); why that should be in your code, I can't say. Although I don't know the specific source of the error, I recommend that you start by updating R, the Rcmdr package, and all other packages to their current versions. Your plug-in should call the current versions of Rcmdr utility functions, such as modelFormula(). I hope this helps, John On Sun, 4 Jan 2009 07:59:27 -0800 (PST) Frieda friederike.gruenin...@uni-passau.de wrote: Dear all, I run R 2.7.2 under Windows and integrated BiodiversityR sucessfully into the R commander. Most functions of BiodiversityR run but others (like analysis of species as response) produce blank windows and the message: Error in get(x, envir = RcmdrEnv(), mode = mode, inherits = FALSE) : Variable operatorFont nicht gefunden What's wrong? Best regards, Frieda -- View this message in context: http://www.nabble.com/Problems-with-Rcmdr-and-BiodiversityR-tp21277781p21277781.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Problems-with-Rcmdr-and-BiodiversityR-tp21277781p21278408.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newbie question
Hi, get yourself one of the many free manuals that you can download online. They answer the easier questions. Tom Short's R reference card may also be very helpful to have on the side when you are an R novice. Cheers, Daniel - cuncta stricte discussurus - -Ursprüngliche Nachricht- Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im Auftrag von greggal...@gmail.com Gesendet: Saturday, January 03, 2009 10:55 PM An: r-help@r-project.org Betreff: [R] Newbie question Hi: I'm loading in students test scores with: abntest - read.table(scores.txt) if I type: abntest I get ALL the values. I want to be able to filter it by various things such as: if( abntesr 90) Print abntest; and other logical operators. I'm sure this is simple for someone experienced. Thanks, Gregg Allen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RES: Customized LDA (MASS) object plot
Thanks for the enlightenment Prof. Ripley. I guess I've to start learning quite more about panel function. Once I'm pointed to the right way I'll probably be able to do what I want. Sorry for the mistake, I totally misses the panel function possibilities. Rodrigo. -Mensagem original- De: Prof Brian Ripley [mailto:rip...@stats.ox.ac.uk] Enviada em: domingo, 4 de janeiro de 2009 11:15 Para: Rodrigo Aluizio Cc: R Help Assunto: Re: [R] Customized LDA (MASS) object plot Hmm: what do you want to do that a customized panel function cannot do? plot.lda does not accept type=n precisely because it does support a panel function to plot within the axes it has set up. Your claim that 'they don?t let me personalize them' is simply untrue. If you want to see the source code, just type 'MASS:::plot.lda'. Futhermore, MASS is support software for a book, and the book does have examples of customized LDA plots! On Sun, 4 Jan 2009, Rodrigo Aluizio wrote: Hi R experts. I performed an Linear Discriminant Analysis (lda) and now I want to plot the first two axes (LDA1 and LDA2). Well MASS package have the plot.lda and pairs.lda to do that. But, they don?t let me personalize them, once they don?t accept the type=?n? plot. So I start looking at the lda object properties, trying to found out my groups (n=4) axis coordinates, curiously I couldn?t. There is only the variables axis coordinates and some analysis results. If I can reach this information, I?ll be able to create a customized graph using the normal plot(), points() and par(). So the question is: How could I reach these axis coordinates inside my object (supposing they are there somewhere, once the plot.lda and pairs.lda are able to plot them)? Thanks in advance. ___ MSc. mailto:r.alui...@gmail.com Rodrigo Aluizio Centro de Estudos do Mar/UFPR Laborat?rio de Micropaleontologia Avenida Beira Mar s/n - CEP 83255-000 Pontal do Paran? - PR - BRASIL Fone: (0**41) 3455-1496 ramal 217 Fax: (0**41) 3455-1105 [[alternative HTML version deleted]] -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Customized LDA (MASS) object plot
Hmm: what do you want to do that a customized panel function cannot do? plot.lda does not accept type=n precisely because it does support a panel function to plot within the axes it has set up. Your claim that 'they don?t let me personalize them' is simply untrue. If you want to see the source code, just type 'MASS:::plot.lda'. Futhermore, MASS is support software for a book, and the book does have examples of customized LDA plots! On Sun, 4 Jan 2009, Rodrigo Aluizio wrote: Hi R experts. I performed an Linear Discriminant Analysis (lda) and now I want to plot the first two axes (LDA1 and LDA2). Well MASS package have the plot.lda and pairs.lda to do that. But, they don?t let me personalize them, once they don?t accept the type=?n? plot. So I start looking at the lda object properties, trying to found out my groups (n=4) axis coordinates, curiously I couldn?t. There is only the variables axis coordinates and some analysis results. If I can reach this information, I?ll be able to create a customized graph using the normal plot(), points() and par(). So the question is: How could I reach these axis coordinates inside my object (supposing they are there somewhere, once the plot.lda and pairs.lda are able to plot them)? Thanks in advance. ___ MSc. mailto:r.alui...@gmail.com Rodrigo Aluizio Centro de Estudos do Mar/UFPR Laborat?rio de Micropaleontologia Avenida Beira Mar s/n - CEP 83255-000 Pontal do Paran? - PR - BRASIL Fone: (0**41) 3455-1496 ramal 217 Fax: (0**41) 3455-1105 [[alternative HTML version deleted]] -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R badly lags matlab on performance? -- Define performance, please.
Folks: Merely my opinions, of course ... Just to amplify a little on Philippe's remarks by paraphrasing comments made many times on this list before. In a galaxy far away a long time ago ... John Chambers and his Bell Labs colleagues -- and subsequently RR (Ross Ihaka and Robert Gentleman) and R's Core Team Developers -- made the decision to develop a language/software for data analysis, data graphics and statistics. Recognizing that most tasks within this arena were for one-off custom problems rather than repetitive production applications, they emphasized flexibility, ease of use and relatively straightforward extensibility. While I'm sure that they did not ignore performance, it was not the primary consideration (Chambers, et al's Blue Book speaks to these issues much more eloquently; I think it should be required reading _BEFORE_ one launches into criticism). As has been frequently mentioned, they knew that there are two outs for such matters: Moore's Law and the ability to easily incorporate customized C code into R. I submit that the data bear out the overwhelming wisdom of their choice. This is not to that R is perfect: there are certainly times when performance is inadequate, and design or implementation could have been (or be) improved. But no one bats a thousand (baseball idiom): as Philippe said, for many (maybe most?) of us R is both awesome and indispensable! For me the real challenge is: what's next? R/S is so blazingly successful that it seems to extingush the need for continuing improvement(the demise of Luke Tierney's X-Lisp Stat is an example): what's the next step in the sequence IMSL -- SAS --- S/R -- ?? . But hopefully this is merely my ignorance speaking, and smart folks are already working on it. Regards to all, Bert Gunter -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Philippe Grosjean Sent: Sunday, January 04, 2009 2:02 AM To: Stefan Grosse Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] R badly lags matlab on performance? I wrote once the benchmark mentioned in Stefan's post (based on initial work by Stephan Steinhaus), and it is still available for those who would like to update it. Note that it is lacking some checking of the results to make sure that calculation is not only faster, but correct! Now, I'll tell why I haven't update it, and you'll see it is connected with the current topic. First, lack of time, for sure. Second, this benchmark has always been very criticized by several people including from the R Core Team. Basically, this is just toy examples, disconnected from the reality. Even with better cases, benchmarks do not take into account the time needed to write your code for your particular application (from the question to the results). I wrote this benchmark at a time when I overemphasized on the pure performances of the software, at a time I was looking for the best software I would choose as a tool for my future career. Now, what's my choice, ten years later? Not two, not three software... but just ONE: R. I tend to do 95% of my calculations with R (the rest is ImageJ/Java). Indeed, this benchmark results (and the toy example of Ajay Shah, a - a + 1) should be only considered very marginally, because what is important is how your software tool is performing in real application, not in simplistic toy examples. R lays behind Matlab for pure arithmetic calculation... right! But R has a better object oriented approach, features more variable types (factor, for instance), and has a richer mechanism for metadata handling (col/row names, various other attributes, ...) that makes it richer to instanciate complex datasets or analyzes than Matlab. Of course, this has a small cost in performance. As soon as you think your problem in a vectorized way, R is one of the best tool, I think, to go from the question to the answer in real situations. How could we quantify this? I would only see big contests where experts of each language would be presented real problems and one would measure the time needed to solve the problem,... Also, one should measure: the robustness, reusability, flexibility, elegance of the code produced (how to quantify these?). Such kind of contest between R, Matlab, Octave, Scilab, etc. is very unlikely to happen. At the end, it is really a matter of personal feeling: you can make your own little contest by yourself: trying to solve a given problem in several software... and then decide which one you prefer. I think many people do/did this, and the still exponential growth of R use (at least, as it can be observed by the increasing number of CRAN R packages) is probably a good sign that R is probably one of the top performers when it comes to efficiency from the question to the answer in real problems, not just on toy little examples! (sorry for been so long, I think I miss some interaction with the R community this time ;-) Best,
Re: [R] Problem with package SNOW on MacOS X 10.5.5
I don't see this on my setup (OS X 10.5.6, R 2.8.0, snow 0.3-3). As snow does not use a name space it is possible that something else you have loaded is masking a snow internal function. Another possibility might be that your worker processes are picking up different versions of R or snow. You might look at what traceback() says and/or use clusterEvalQ to query the workers for R and snow versions. luke On Wed, 31 Dec 2008, Greg Riddick wrote: Hello All, I can run the lower level functions OK, but many of the higher level (eg. parSApply) functions are generating errors. When running the example (from the snow help docs) for parApply on MacOSX 10.5.5, I get the following error: cl - makeSOCKcluster(c(localhost,localhost)) sum(parApply(cl, matrix(1:100,10), 1, sum)) Error in do.call(fun, lapply(args, enquote)) : could not find function fun Any ideas? Do I possibly need MPI or PVM to run the Apply functions? Thanks, -- Luke Tierney Chair, Statistics and Actuarial Science Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: l...@stat.uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R badly lags matlab on performance?
On Sat, Jan 3, 2009 at 7:02 PM, l...@stat.uiowa.edu wrote: R's interpreter is fairly slow due in large part to the allocation of argument lists and the cost of lookups of variables, including ones like [- that are assembled and looked up as strings on every call. Wow, I had no idea the interpreter was so awful. Just some simple tree-to-tree transformations would speed things up, I'd think, e.g. `-`(`[`(...), ...) == `-[`(...,...). The current byte code compiler available from my web site speeds this (highly artificial) example by about a factor of 4. The experimental byte code engine I am currently working on (and that can't yet do much more than an example like this) speeds this up by a factor of 80. Whether that level of improvement (for toy examples like this) will remain once the engine is more complete and whether a reasonable compiler can optimize down to the assembly code I used remain to be seen. Not sure I follow here. It sounds as though you have 4 levels of execution: 1) interpreter 2) current byte-code engine 3) future byte-code engine 4) compilation of byte codes into machine code Is that right? I'm not sure what the difference between 2 and 3 is, and what the 80x figure refers to. I'd think that one of the challenges will be the dynamic types -- where you don't know statically if an argument is a logical, an integer, a real, or a string. Will you be adding declarations, assuming the best case and interpreting all others or ...? Does Matlab have the same type problem? Or does it make everything into a double? That still wouldn't explain the vectorized case, since the type dispatch only has to happen once. Sometimes some very simple changes in the implementation can make huge differences in overall runtime. I still remember a 10-word change I made in Maclisp in 1975 or so where I special-cased the two-argument case of (+ integer integer) = integer -- what it normally did was convert it to the general n-argument arbitrary-type case. This speeded up (+ integer integer) by 10x (which doesn't matter much), but also sped up the overall Macsyma symbolic algebra system by something like 20%. -s -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error could not find function
I have copied a very simple function but when I try to call it I get an error message could not find function. The function file is named addthree.r and I cd to the directory where the function is saved before calling it. What am I doing wrong? -- View this message in context: http://www.nabble.com/Error-%22could-not-find-function%22-tp21279720p21279720.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newbie question
Also, subset( abntest, abntesr 90 ) -Don At 8:55 PM -0700 1/3/09, greggal...@gmail.com wrote: Hi: I'm loading in students test scores with: abntest - read.table(scores.txt) if I type: abntest I get ALL the values. I want to be able to filter it by various things such as: if( abntesr 90) Print abntest; and other logical operators. I'm sure this is simple for someone experienced. Thanks, Gregg Allen __ R-help@r-project.org mailing list https:// stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- - Don MacQueen Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 m...@llnl.gov __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error could not find function
I have copied a very simple function but when I try to call it I get an error message could not find function. The function file is named addthree.r and I cd to the directory where the function is saved before calling it. What am I doing wrong? R does not look for funcitons to execute in random files. You have to source() the file to execute the code -- i.e. define the function in the current session. See also section 1.10 of the manual (An Introduction to R). cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R badly lags matlab on performance?
On Sun, 4 Jan 2009, Stavros Macrakis wrote: On Sat, Jan 3, 2009 at 7:02 PM, l...@stat.uiowa.edu wrote: R's interpreter is fairly slow due in large part to the allocation of argument lists and the cost of lookups of variables, including ones like [- that are assembled and looked up as strings on every call. Wow, I had no idea the interpreter was so awful. Just some simple tree-to-tree transformations would speed things up, I'd think, e.g. `-`(`[`(...), ...) == `-[`(...,...). 'Awful' seems a bit strong. It's also a bit more complicated in that one needs both [ and [- in complex assignment expressions, but the point that one could rewrite assignments into something that can be more efficiently executed is certainly true. There are also a number of opportunities to do things like this. They do have repercussions though -- in this case one would either need to modify code that needs to lok at the original code to undo the operatin, or add a new data structure that contains the original code object and the rewritten one, and deal with implications for serialization, and so on. Doable of course, and worth doing if the payoff is high enough, but I'm not convinced it is at this point. The current byte code compiler available from my web site speeds this (highly artificial) example by about a factor of 4. The experimental byte code engine I am currently working on (and that can't yet do much more than an example like this) speeds this up by a factor of 80. Whether that level of improvement (for toy examples like this) will remain once the engine is more complete and whether a reasonable compiler can optimize down to the assembly code I used remain to be seen. Not sure I follow here. It sounds as though you have 4 levels of execution: 1) interpreter 2) current byte-code engine 3) future byte-code engine 4) compilation of byte codes into machine code Is that right? I'm not sure what the difference between 2 and 3 is, 3 is hopefully a much more efficient engine than 2. I'm not looking at 4 for now but keeping an eye on the possibility, at least via C code generation. and what the 80x figure refers to. relative to the current interpreter -- I got 80 sec with the interpreter and 1 sec with the new byte code engine. I'd think that one of the challenges will be the dynamic types -- where you don't know statically if an argument is a logical, an integer, a real, or a string. Will you be adding declarations, assuming the best case and interpreting all others or ...? I am for now trying to get away without declarations and pre-testing for the best cases before passing others off to the current internal code. By taking advantage of the mechanisms we use now to avoid uneccessary copies it _looks_ like this allows me ot avoid boxing up intermediate values in many cases and that seems to help a lot. Given the overhead of the engine I'm not sure if specific type information would help that much (quick experiments suggest it doesn't but that needs more testing) -- it would of course pay off with machine code generation. Does Matlab have the same type problem? Or does it make everything into a double? That still wouldn't explain the vectorized case, since the type dispatch only has to happen once. I suspect the main reason for the difference in the vectorized case is that our current code does not special-case the vector/scalar case. R has more general recycling rules than Matlab, and the current code in the interpreter is written for the general case only (I thought we had special-cased scalar/scalar but unless I missed something in a quick look it appears not). Sometimes some very simple changes in the implementation can make huge differences in overall runtime. I still remember a 10-word change I made in Maclisp in 1975 or so where I special-cased the two-argument case of (+ integer integer) = integer -- what it normally did was convert it to the general n-argument arbitrary-type case. This speeded up (+ integer integer) by 10x (which doesn't matter much), but also sped up the overall Macsyma symbolic algebra system by something like 20%. We've had a few of those, and I suspect there are plenty more. There is always a trade-off in complicating the code and the consequences for maintainability that implies. A 1.5 factor difference here I find difficult to get excited about, but it might be worth a look. luke -s -s -- Luke Tierney Chair, Statistics and Actuarial Science Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: l...@stat.uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
[R] Bivarite Weibull Distribution
HI Every one Could some one provide me definitions of following bivariate distributions gamma, exponencial, Weibull, half-normal , Rayleigh, Erlang,chi-square thanks A.S. Qureshi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R badly lags matlab on performance?
Stavros Macrakis wrote: On Sat, Jan 3, 2009 at 7:02 PM, l...@stat.uiowa.edu wrote: R's interpreter is fairly slow due in large part to the allocation of argument lists and the cost of lookups of variables, including ones like [- that are assembled and looked up as strings on every call. Wow, I had no idea the interpreter was so awful. Just some simple tree-to-tree transformations would speed things up, I'd think, e.g. `-`(`[`(...), ...) == `-[`(...,...). Doesn't really help (and it's not quite correct: a[2] - 1 is equivalent to a - `[-`(a, 2, 1) with some sneakiness that assumes that the two a's are the same, so that you might destructively modify the second instance.) The actual interpreter is not much of a bottleneck. There are two other major obstacles: 1) Things may not be what they seem 2) Insufficient control over object duplication 1) is the major impediment to compilability (look for talks/papers by Luke for further details and ideas about what to do about it). The basic issue is that at no point can you be sure that the log function calculates logarithms. It might be redefined as a side effect of the previous expression. This is a feature of the language as such, and it is difficult to change without destroying features that people actually use. The upshot is that every time we see an object name, we enter a search along the current search path to find it's current binding. 2) is a little contentious: It is not certain how much we gain by attacking it, only that it would be a heck of a lot of work. The issue is that we do not use reference counting like e.g. Java or Tcl does. We use a primitive counter called NAMED which can be 0,1, or 2, and only counts upwards. When it reaches 2, destructive modification is disallowed and the object must be copied. I.e. consider x - rnorm(1e6) y - x at this point we actually have x and y referring to the same ~8MB block of memory. However, the semantics of R is that this is a virtual copy, so y[1] - 1 or x[1] - 1 entails that we duplicate the object. Fair enough, if an object is bound to multiple names, we can not modify it in place; the problem is that we lose track when the references go away, and thus, y - x y[1] - 1 x[1] - 1 causes TWO duplications. The really nasty bit is that we very often get objects temporarily bound to two names (think about what happens with arguments in function calls). Unfortunately, we cannot base the memory management purely on reference counting. And of course, doing so, even partially, implies that we need to have a much more concrete approach to the unbinding of objects. Notice, for instance that the names used in a function evaluation frame are not guaranteed to be unbind-able when the function exits. Something might have saved the evaluation environment, e.g. using e - environment() but there are also more subtle methods. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ratetable creation
Dear All, Can anyone can help me in creation ratetable object assuming, that I have mortality rates organized in standard way by year, gender and age (0-99). I would be very grateful for any help. Best regards, Daniel Rabczenko [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how specify lme() with multiple within-subject factors?
Dear Ben, I'm cc'ing R-sig-mixed-models because that's a more appropriate list for questions on lme(). Lme() is only able to work with nested random effects, not with crossed random effects. Therefore you would need lmer() from the lme4 package. But I don't think you need crossed random effects. Random slopes should do the trick since wtype and present have only two levels. Try something like lme(.., .., random = ~wtype * present | subj) HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Ben Meijering Verzonden: zaterdag 3 januari 2009 19:59 Aan: r-help@r-project.org Onderwerp: [R] how specify lme() with multiple within-subject factors? I have some questions about the use of lme(). Below, I constructed a minimal dataset to explain what difficulties I experience: # two participants subj - factor(c(1, 1, 1, 1, 2, 2, 2, 2)) # within-subjects factor Word Type wtype - factor(c(nw, w, nw, w, nw, w, nw, w)) # within-subjects factor Target Present/Absent present - factor(c(0, 0, 1, 1, 0, 0, 1, 1)) # dependend variable Accuracy acc - c(.74, .81, .84, .88, .75, .95, .88, .94) # repeated-measures analysis of variance acc.aov - aov(acc ~ wtype * present + Error(subj/wtype*present)) summary(acc.aov) # to use lme library(nlme) # mixed-effects model acc.lme - lme(acc ~ wtype * present, random = ~ 1 | subj) anova(acc.lme) How do I have to specify the model to have 1 degree of freedom for the denominator or error-term, as in aov()? I know how to do this for the first factor: lme(.., .., random = ~1 | subj/wtype), or lme(.., .., random = list( ~ 1 | subj, ~1 | wtype)) , but not how to get the same degrees of freedom as in the specified aov(), i.e., 1 degree of freedom of the denominator for both factors and the interaction term. How do I specify such a model? ~ Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how specify lme() with multiple within-subject factors?
Folks: Lme() is only able to work with nested random effects, not with crossed random effects. Not quite true. Crossed models **can** be done, albeit clumsily, via pdMatrix objects: the Bates/Pinheiro book even contains an example or two (one on assay plates, I recall, but I don't have my book with me for the reference). Also, lme, not lmer, is currently the only way to implement penalized splines as random effects -- see the lmeSplines package. -- Bert Gunter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R badly lags matlab on performance?
On Sun, Jan 4, 2009 at 4:50 PM, l...@stat.uiowa.edu wrote: On Sun, 4 Jan 2009, Stavros Macrakis wrote: On Sat, Jan 3, 2009 at 7:02 PM, l...@stat.uiowa.edu wrote: R's interpreter is fairly slow due in large part to the allocation of argument lists and the cost of lookups of variables, I'd think another problem is call-by-need. I suppose inlining or batch analyzing groups of functions helps there. including ones like [- that are assembled and looked up as strings on every call. Wow, I had no idea the interpreter was so awful. Just some simple tree-to-tree transformations would speed things up, I'd think, e.g. `-`(`[`(...), ...) == `-[`(...,...). 'Awful' seems a bit strong. Well, I haven't looked at the code, but if I'm interpreting assembled and looked up as strings on every call correctly, this means taking names, expanding them to strings, concatenating them, re-interning them, then looking up the value. That sounds pretty awful to me both in the sense of being inefficient and of being ugly. I'd think that one of the challenges will be the dynamic types --... I am for now trying to get away without declarations and pre-testing for the best cases before passing others off to the current internal code. Have you considered using Java bytecodes and taking advantage of dynamic compilers like Hotspot? They often do a good job in cases like this by assuming that types are fairly predictable from one run to the next of a piece of code. Or is the Java semantic model too different? ...There is always a trade-off in complicating the code and the consequences for maintainability that implies. Agreed entirely! A 1.5 factor difference here I find difficult to get excited about, but it might be worth a look. I agree. The 1.5 isn't a big deal at all. -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R badly lags matlab on performance?
Thanks for the explanations of the internals. I understand about the 'redefining log' problem in the interpreter, but I wasn't aware of the NAMED counter. In both cases, beyond static analysis, dynamic Java compilers do a pretty good job, but I don't know if Java bytecodes are suitable for R, and if they are not, it would be a lot of work to duplicate the analysis for an R compiler. -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to extract range of colums in a data frame
Dear all, I have the following data frame: dat V1 V2 V3 V4V5 V6 V7 V8 V9 1 1 CACCCA 9.0 18 12.00 18.0 15.0 12.0 6.0 2 1 ACGATACGGCGACCACCGAGATCTACACTCTTCC 18.0 8 12.00 18.0 15.0 12.0 18.0 3 1 ACTACTGCTACTCC 15.0 8 12.00 12.0 18.0 12.0 12.0 4 1 ACTTATACGGCGACCACCGAGATCTACACTCTTT 15.0 6 18.00 6.0 18.0 15.0 9.0 5 1 CC 21.0 21 21.00 21.0 21.0 21.0 21.0 6 1 CTACACTCTTTCCCTACACGCCGCTCTTCCGATC 21.0 21 21.00 21.0 21.0 21.0 21.0 7 1 TACACCGCATCTCCACACTCTC 12.0 21 12.00 21.0 21.0 21.0 21.0 8 1 TGATACGCCTACCACCGCCCTCTACACTCTCTCC 15.0 9 18.00 18.0 15.0 15.0 6.0 9 1 TGATACGGCGACCACCGAGATCTACACTCTCTCC 21.0 21 21.00 21.0 21.0 21.0 21.0 10 4 TGATACGGCGACCACCGAGATCTACACTCTTTCC 19.5 18 15.75 19.5 16.5 19.5 18.0 11 1 TGATACGGCGACCACCGAGGATCTACACTCTTTC 21.0 21 21.00 21.0 21.0 21.0 21.0 12 1 TGATACGGCGACCACCGAGGATCTCCACTCTCTC 21.0 21 21.00 21.0 21.0 21.0 21.0 13 2 TGCTCCGGCGACCACCGAGATCTACACTCTTTCC 18.0 8 12.00 18.0 13.5 18.0 13.5 14 1 TTATACGTCGACCACCGAGATCTACACTCTCTCC 18.0 18 18.00 18.0 18.0 18.0 15.0 15 1 TTCTCCGGCGACCACCGAGATCTACACTCTTTCC 18.0 7 9.00 18.0 12.0 18.0 15.0 16 1 TTCTCCGGCGACCACCGCGATCTACACTCTTTCC 18.0 7 9.00 18.0 12.0 18.0 15.0 My question is how can I extract the column V3 up to V9 into another new data frame. I tried this but failed: str - paste(V, 3:9, sep=) print(dat$str) - Gundala Viswanath Jakarta - Indonesia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract range of colums in a data frame
Dear Gundala, Try this: ex-paste(V,3:9,sep=) new.dat-dat[,ex] new.dat HTH, Jorge On Sun, Jan 4, 2009 at 9:36 PM, Gundala Viswanath gunda...@gmail.comwrote: Dear all, I have the following data frame: dat V1 V2 V3 V4V5 V6 V7 V8 V9 1 1 CACCCA 9.0 18 12.00 18.0 15.0 12.0 6.0 2 1 ACGATACGGCGACCACCGAGATCTACACTCTTCC 18.0 8 12.00 18.0 15.0 12.0 18.0 3 1 ACTACTGCTACTCC 15.0 8 12.00 12.0 18.0 12.0 12.0 4 1 ACTTATACGGCGACCACCGAGATCTACACTCTTT 15.0 6 18.00 6.0 18.0 15.0 9.0 5 1 CC 21.0 21 21.00 21.0 21.0 21.0 21.0 6 1 CTACACTCTTTCCCTACACGCCGCTCTTCCGATC 21.0 21 21.00 21.0 21.0 21.0 21.0 7 1 TACACCGCATCTCCACACTCTC 12.0 21 12.00 21.0 21.0 21.0 21.0 8 1 TGATACGCCTACCACCGCCCTCTACACTCTCTCC 15.0 9 18.00 18.0 15.0 15.0 6.0 9 1 TGATACGGCGACCACCGAGATCTACACTCTCTCC 21.0 21 21.00 21.0 21.0 21.0 21.0 10 4 TGATACGGCGACCACCGAGATCTACACTCTTTCC 19.5 18 15.75 19.5 16.5 19.5 18.0 11 1 TGATACGGCGACCACCGAGGATCTACACTCTTTC 21.0 21 21.00 21.0 21.0 21.0 21.0 12 1 TGATACGGCGACCACCGAGGATCTCCACTCTCTC 21.0 21 21.00 21.0 21.0 21.0 21.0 13 2 TGCTCCGGCGACCACCGAGATCTACACTCTTTCC 18.0 8 12.00 18.0 13.5 18.0 13.5 14 1 TTATACGTCGACCACCGAGATCTACACTCTCTCC 18.0 18 18.00 18.0 18.0 18.0 15.0 15 1 TTCTCCGGCGACCACCGAGATCTACACTCTTTCC 18.0 7 9.00 18.0 12.0 18.0 15.0 16 1 TTCTCCGGCGACCACCGCGATCTACACTCTTTCC 18.0 7 9.00 18.0 12.0 18.0 15.0 My question is how can I extract the column V3 up to V9 into another new data frame. I tried this but failed: str - paste(V, 3:9, sep=) print(dat$str) - Gundala Viswanath Jakarta - Indonesia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract range of colums in a data frame
x - dat[,3:9] # I think this is what you want. On Sun, Jan 4, 2009 at 9:42 PM, Jorge Ivan Velez jorgeivanve...@gmail.com wrote: Dear Gundala, Try this: ex-paste(V,3:9,sep=) new.dat-dat[,ex] new.dat HTH, Jorge On Sun, Jan 4, 2009 at 9:36 PM, Gundala Viswanath gunda...@gmail.comwrote: Dear all, I have the following data frame: dat V1 V2 V3 V4V5 V6 V7 V8 V9 1 1 CACCCA 9.0 18 12.00 18.0 15.0 12.0 6.0 2 1 ACGATACGGCGACCACCGAGATCTACACTCTTCC 18.0 8 12.00 18.0 15.0 12.0 18.0 3 1 ACTACTGCTACTCC 15.0 8 12.00 12.0 18.0 12.0 12.0 4 1 ACTTATACGGCGACCACCGAGATCTACACTCTTT 15.0 6 18.00 6.0 18.0 15.0 9.0 5 1 CC 21.0 21 21.00 21.0 21.0 21.0 21.0 6 1 CTACACTCTTTCCCTACACGCCGCTCTTCCGATC 21.0 21 21.00 21.0 21.0 21.0 21.0 7 1 TACACCGCATCTCCACACTCTC 12.0 21 12.00 21.0 21.0 21.0 21.0 8 1 TGATACGCCTACCACCGCCCTCTACACTCTCTCC 15.0 9 18.00 18.0 15.0 15.0 6.0 9 1 TGATACGGCGACCACCGAGATCTACACTCTCTCC 21.0 21 21.00 21.0 21.0 21.0 21.0 10 4 TGATACGGCGACCACCGAGATCTACACTCTTTCC 19.5 18 15.75 19.5 16.5 19.5 18.0 11 1 TGATACGGCGACCACCGAGGATCTACACTCTTTC 21.0 21 21.00 21.0 21.0 21.0 21.0 12 1 TGATACGGCGACCACCGAGGATCTCCACTCTCTC 21.0 21 21.00 21.0 21.0 21.0 21.0 13 2 TGCTCCGGCGACCACCGAGATCTACACTCTTTCC 18.0 8 12.00 18.0 13.5 18.0 13.5 14 1 TTATACGTCGACCACCGAGATCTACACTCTCTCC 18.0 18 18.00 18.0 18.0 18.0 15.0 15 1 TTCTCCGGCGACCACCGAGATCTACACTCTTTCC 18.0 7 9.00 18.0 12.0 18.0 15.0 16 1 TTCTCCGGCGACCACCGCGATCTACACTCTTTCC 18.0 7 9.00 18.0 12.0 18.0 15.0 My question is how can I extract the column V3 up to V9 into another new data frame. I tried this but failed: str - paste(V, 3:9, sep=) print(dat$str) - Gundala Viswanath Jakarta - Indonesia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract range of colums in a data frame
Or this: new.DF - subset(dat, select = V3:V9) str(new.DF) 'data.frame': 16 obs. of 7 variables: $ V3: num 9 18 15 15 21 21 12 15 21 19.5 ... $ V4: int 18 8 8 6 21 21 21 9 21 18 ... $ V5: num 12 12 12 18 21 ... $ V6: num 18 18 12 6 21 21 21 18 21 19.5 ... $ V7: num 15 15 18 18 21 21 21 15 21 16.5 ... $ V8: num 12 12 12 15 21 21 21 15 21 19.5 ... $ V9: num 6 18 12 9 21 21 21 6 21 18 ... See ?subset for the above and ?[.data.frame for additional information on subsetting data frames, which is also covered in An Introduction to R. HTH, Marc Schwartz on 01/04/2009 08:42 PM Jorge Ivan Velez wrote: Dear Gundala, Try this: ex-paste(V,3:9,sep=) new.dat-dat[,ex] new.dat HTH, Jorge On Sun, Jan 4, 2009 at 9:36 PM, Gundala Viswanath gunda...@gmail.comwrote: Dear all, I have the following data frame: dat V1 V2 V3 V4V5 V6 V7 V8 V9 1 1 CACCCA 9.0 18 12.00 18.0 15.0 12.0 6.0 2 1 ACGATACGGCGACCACCGAGATCTACACTCTTCC 18.0 8 12.00 18.0 15.0 12.0 18.0 3 1 ACTACTGCTACTCC 15.0 8 12.00 12.0 18.0 12.0 12.0 4 1 ACTTATACGGCGACCACCGAGATCTACACTCTTT 15.0 6 18.00 6.0 18.0 15.0 9.0 5 1 CC 21.0 21 21.00 21.0 21.0 21.0 21.0 6 1 CTACACTCTTTCCCTACACGCCGCTCTTCCGATC 21.0 21 21.00 21.0 21.0 21.0 21.0 7 1 TACACCGCATCTCCACACTCTC 12.0 21 12.00 21.0 21.0 21.0 21.0 8 1 TGATACGCCTACCACCGCCCTCTACACTCTCTCC 15.0 9 18.00 18.0 15.0 15.0 6.0 9 1 TGATACGGCGACCACCGAGATCTACACTCTCTCC 21.0 21 21.00 21.0 21.0 21.0 21.0 10 4 TGATACGGCGACCACCGAGATCTACACTCTTTCC 19.5 18 15.75 19.5 16.5 19.5 18.0 11 1 TGATACGGCGACCACCGAGGATCTACACTCTTTC 21.0 21 21.00 21.0 21.0 21.0 21.0 12 1 TGATACGGCGACCACCGAGGATCTCCACTCTCTC 21.0 21 21.00 21.0 21.0 21.0 21.0 13 2 TGCTCCGGCGACCACCGAGATCTACACTCTTTCC 18.0 8 12.00 18.0 13.5 18.0 13.5 14 1 TTATACGTCGACCACCGAGATCTACACTCTCTCC 18.0 18 18.00 18.0 18.0 18.0 15.0 15 1 TTCTCCGGCGACCACCGAGATCTACACTCTTTCC 18.0 7 9.00 18.0 12.0 18.0 15.0 16 1 TTCTCCGGCGACCACCGCGATCTACACTCTTTCC 18.0 7 9.00 18.0 12.0 18.0 15.0 My question is how can I extract the column V3 up to V9 into another new data frame. I tried this but failed: str - paste(V, 3:9, sep=) print(dat$str) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bivarite Weibull Distribution
Dear A.S. Qureshi, On Sun, Jan 4, 2009 at 11:36 AM, saboorha...@gmail.com wrote: HI Every one Could some one provide me definitions of following bivariate distributions gamma, exponencial, Weibull, half-normal , Rayleigh, Erlang,chi-square See Johnson, Kotz, and Balakrishnan (2000) for a reference book for multivariate distributions. From there, you will see that there are _many_ bivariate distributions that have Weibull marginals (or any other marginal distribution, for that matter). In other words, there isn't a bivariate Weibull distribution... there are all kinds of them. A modern way to address this is by using copulas; see Nelson (1998, 2007). To this end, R has packages fCopulae and copula among others. There is a CRAN Task View for Probability Distributions: http://cran.r-project.org/web/views/Distributions.html Using copulas and (for example) the inverse CDF approach, one can generate bivariate samples that have any given marginal distribution. See Nelson for details. Best, Jay thanks A.S. Qureshi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- *** G. Jay Kerns, Ph.D. Associate Professor Department of Mathematics Statistics Youngstown State University Youngstown, OH 44555-0002 USA Office: 1035 Cushwa Hall Phone: (330) 941-3310 Office (voice mail) -3302 Department -3170 FAX E-mail: gke...@ysu.edu http://www.cc.ysu.edu/~gjkerns/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the first and last observation for each subject
[R] the first and last observation for each subject hadley wickham h.wickham at gmail.com Fri Jan 2 14:52:42 CET 2009 On Fri, Jan 2, 2009 at 3:20 AM, gallon li gallon.li at gmail.com wrote: I have the following data ID x y time 1 10 20 0 1 10 30 1 1 10 40 2 2 12 23 0 2 12 25 1 2 12 28 2 2 12 38 3 3 5 10 0 3 5 15 2 . x is time invariant, ID is the subject id number, y is changing over time. I want to find out the difference between the first and last observed y value for each subject and get a table like ID x y 1 10 20 2 12 15 3 5 5 .. Is there any easy way to generate the data set? One approach is to use the plyr package, as documented at http://had.co.nz/plyr. The basic idea is that your problem is easy to solve if you have a subset for a single subject value: one - subset(DF, ID == 1) with(one, y[length(y)] - y[1]) The difficulty is splitting up the original dataset in to subjects, applying the solution to each piece and then joining all the results back together. This is what the plyr package does for you: library(plyr) # ddply is for splitting up data frames and combining the results # into a data frame. .(ID) says to split up the data frame by the subject # variable ddply(DF, .(ID), function(one) with(one, y[length(y)] - y[1])) ... The above is much quicker than the versions based on aggregate and easy to understand. Another approach is more specialized but useful when you have lots of ID's (e.g., millions) and speed is very important. It computes where the first and last entry for each ID in a vectorized computation, akin to the computation that rle() uses: f0 - function(DF){ changes - DF$ID[-1] != DF$ID[-length(DF$ID)] first - c(TRUE, changes) last - c(changes, TRUE) ydiff - DF$y[last] - DF$y[first] DF - DF[first,] DF$y - ydiff DF } Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Process File Line By Line Without Slurping into Object
Dear all, In general practice one would slurp the whole file using this method before processing the data: dat - read.table(filename) or variations of it. Is there a way we can access the file line by line without slurping/storing them into object? I am thinking something like this in Perl: __BEGIN__ open INFILE, '' , 'filename.txt' or die $!; while (INFILE) { my $line = $_; # then process line by line } __END__ the reason I want to do that is because the data I am processing are large (~ 5GB), my PC may not be able to handle that. - Gundala Viswanath Jakarta - Indonesia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] getResponse(model.lme) yields incorrect number of dimensions error
Dear R experts, I would like to get an R^2 - like value for a multilevel regression using lme. I followed an archived suggestion by José Pinheiro to use the squared correlation between fitted and observed values, i.e., (cor(fitted(model.lme), getResponse(model.lme))^2 but getResponse returns the error message Error in val[, level] : incorrect number of dimensions The same happens with residuals(model.lme) and summary(model.lme) but not other generic functions such as predict, coef, or fitted, which (seem to) work fine. I have searched the archives but could not find a solution--I don't really understand what dimensions R is referring to. Any comments or suggestions would be greatly appreciated! Thanks for your time, and happy new year, Ullrich Below is part of the dataframe and the code: WMUCTrim Subj Cond Acc S R T 1 11 0.667 1 1 1 2 12 0.8095238 1 0 1 4 14 1.000 1 0 0 6 16 0.9523810 0 0 1 7 17 0.8571429 0 1 0 8 18 1.000 0 0 0 20921 0.3809524 1 1 1 21022 0.9047619 1 0 1 21224 1.000 1 0 0 21426 0.8571429 0 0 1 21527 0.667 0 1 0 21628 1.000 0 0 0 mlr2 - summary(lme(Acc ~ R + T + S, random = ~1 | Subj)) mlr2 Linear mixed-effects model fit by REML Data: NULL AIC BIC logLik -1140.414 -1113.778 576.2068 Random effects: Formula: ~1 | Subj (Intercept) Residual StdDev: 0.07069723 0.08233792 Fixed effects: Acc ~ R + T + S Value Std.Error DF t-value p-value (Intercept) 0.9843537 0.008937809 522 110.13367 0. R1 -0.1139456 0.006958824 522 -16.37426 0. T1 -0.1012472 0.006958824 522 -14.54946 0. S1 -0.0137188 0.006958824 522 -1.97143 0.0492 Correlation: (Intr) R1 T1 R1 -0.260 T1 -0.260 0.000 S1 -0.260 0.000 -0.333 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -4.21190127 -0.46318153 0.02715579 0.58591808 2.57708969 Number of Observations: 630 Number of Groups: 105 class(mlr2) [1] summary.lme lme cor(fitted(mlr2), getResponse(mlr2))^2 Error in val[, level] : incorrect number of dimensions Dr Ullrich Ecker | Postdoctoral Research Fellow | Cognitive Science Laboratories | Room 211 Sanders Building | * School of Psychology | M304 | The University of Western Australia | 35 Stirling Highway | Crawley WA 6009 | Australia | ( 08 6488 3266 | Ê 08 6488 1006 | À 04 5822 0072 | @ ullrich.ec...@uwa.edu.au | i www.cogsciwa.com | [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] eval using a environment X but resultsin .GlobalEnv
Hello, Suppose I have an expression, E, which accesses some variables present in an environment V. I do this via eval(E,envir=V) however all assignments end up in V. I would like the results of assignments in E to end up in the .GlobalEnv ? Or at least the calling environment. Is there a quick way to this instead of iterating over all objects E and assigning into .GlobalEnv? Thank you Saptarshi -- Saptarshi Guha - saptarshi.g...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eval using a environment X but resultsin .GlobalEnv
Try this: .GlobalEnv$x - 3 Also x - 3 will work if there is no x between V and the global environment but if there is then that one will get set rather than the one in the global environment. On Mon, Jan 5, 2009 at 1:52 AM, Saptarshi Guha saptarshi.g...@gmail.com wrote: Hello, Suppose I have an expression, E, which accesses some variables present in an environment V. I do this via eval(E,envir=V) however all assignments end up in V. I would like the results of assignments in E to end up in the .GlobalEnv ? Or at least the calling environment. Is there a quick way to this instead of iterating over all objects E and assigning into .GlobalEnv? Thank you Saptarshi -- Saptarshi Guha - saptarshi.g...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.