[R] Python difflib R equivalent?
Hi, Does anyone know of a R library that is equivalent in functionality to the Python standard libraries' difflib library? The python docs say this about difflib: This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs. http://docs.python.org/library/difflib.html Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python difflib R equivalent?
Paul: 1. I do not know if any such library exists. 2. However, if I understand correctly, one usually does this sort of thing in R with functions like ?match (or ?%in%) and logical comparison operations like ?== . Of course, for numeric comparisons, you need to be aware of R FAQ 7.31 If you are interested in comparing what in R are character vectors, then various string operators, e.g. ?grep, ?regexp and character operation packages (e.g. gsubfn, stringr) may be of use. As usual, you are more likely to receive a helpful answer if you do as the posting guide requests and provide a small, reproducible example of the sort(s) of thing(s) you want to do. Cheers, Bert On Thu, Jul 28, 2011 at 4:05 AM, Paul newzealandsp...@gmail.com wrote: Hi, Does anyone know of a R library that is equivalent in functionality to the Python standard libraries' difflib library? The python docs say this about difflib: This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs. http://docs.python.org/library/difflib.html Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python difflib R equivalent? -- Correction
Item 1 below should be changed to: 1. I do not know if any such PACKAGE exists. (A library in R is a file directory where R packages are stored) -- Bert On Thu, Jul 28, 2011 at 8:05 AM, Bert Gunter bgun...@gene.com wrote: Paul: 1. I do not know if any such library exists. 2. However, if I understand correctly, one usually does this sort of thing in R with functions like ?match (or ?%in%) and logical comparison operations like ?== . Of course, for numeric comparisons, you need to be aware of R FAQ 7.31 If you are interested in comparing what in R are character vectors, then various string operators, e.g. ?grep, ?regexp and character operation packages (e.g. gsubfn, stringr) may be of use. As usual, you are more likely to receive a helpful answer if you do as the posting guide requests and provide a small, reproducible example of the sort(s) of thing(s) you want to do. Cheers, Bert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python difflib R equivalent?
On Thu, 28 Jul 2011, Bert Gunter wrote: Paul: 1. I do not know if any such library exists. Not to my knowledge, and we have contemplated providing such functions. But for files see e.g. tools::Rdiff, and generally R will not be a good way to do this sort of thing on files (since the flexibility of R's i/o via connections does have a cost) 2. However, if I understand correctly, one usually does this sort of thing in R with functions like ?match (or ?%in%) and logical comparison operations like ?== . Of course, for numeric comparisons, you need to be aware of R FAQ 7.31 If you are interested in comparing what in R are character vectors, then various string operators, e.g. ?grep, ?regexp and character operation packages (e.g. gsubfn, stringr) may be of use. As usual, you are more likely to receive a helpful answer if you do as the posting guide requests and provide a small, reproducible example of the sort(s) of thing(s) you want to do. Or at least a URI of examples you wish to emulate, which for difflib seem to be the sort of thing POSIX diff and friends are used for. Cheers, Bert On Thu, Jul 28, 2011 at 4:05 AM, Paul newzealandsp...@gmail.com wrote: Hi, Does anyone know of a R library that is equivalent in functionality to the Python standard libraries' difflib library? The python docs say this about difflib: This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs. http://docs.python.org/library/difflib.html Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python difflib R equivalent?
There is a package rJython, which claims to provide an R interface to Python via Jython. I haven't used it, but the lead author, Gabor Grothendieck, is well known in the R community. Spencer On 7/28/2011 9:15 AM, Prof Brian Ripley wrote: On Thu, 28 Jul 2011, Bert Gunter wrote: Paul: 1. I do not know if any such library exists. Not to my knowledge, and we have contemplated providing such functions. But for files see e.g. tools::Rdiff, and generally R will not be a good way to do this sort of thing on files (since the flexibility of R's i/o via connections does have a cost) 2. However, if I understand correctly, one usually does this sort of thing in R with functions like ?match (or ?%in%) and logical comparison operations like ?== . Of course, for numeric comparisons, you need to be aware of R FAQ 7.31 If you are interested in comparing what in R are character vectors, then various string operators, e.g. ?grep, ?regexp and character operation packages (e.g. gsubfn, stringr) may be of use. As usual, you are more likely to receive a helpful answer if you do as the posting guide requests and provide a small, reproducible example of the sort(s) of thing(s) you want to do. Or at least a URI of examples you wish to emulate, which for difflib seem to be the sort of thing POSIX diff and friends are used for. Cheers, Bert On Thu, Jul 28, 2011 at 4:05 AM, Paul newzealandsp...@gmail.com wrote: Hi, Does anyone know of a R library that is equivalent in functionality to the Python standard libraries' difflib library? The python docs say this about difflib: This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs. http://docs.python.org/library/difflib.html Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
Different methods of performing least squares calculations in R are discussed in @Article{Rnews:Bates:2004, author = {Douglas Bates}, title= {Least Squares Calculations in {R}}, journal = {R News}, year = 2004, volume = 4, number = 1, pages= {17--20}, month= {June}, url = http, pdf = Rnews2004-1 } Some of the functions mentioned in that article have been modified. A more up-to-date version of the comparisons in that article is available as the Comparisons vignette in the Matrix package. On Fri, Feb 20, 2009 at 6:06 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Note that using solve can be numerically unstable for certain problems. On Fri, Feb 20, 2009 at 6:50 AM, Kenn Konstabel lebats...@gmail.com wrote: Decyphering formulas seems to be the most time consuming part of lm: mylm1 - function(formula, data) { # not perfect but works F - model.frame(formula,data) y - model.response(F) mt - attr(F, terms) x - model.matrix(mt,F) coefs - solve(crossprod(x), crossprod(x,y)) coefs } mylm2 - function(x, y, intercept=TRUE) { if(!is.matrix(x)) x - as.matrix(x) if(intercept) x - cbind(1,x) if(!is.matrix(y)) y - as.matrix(y) solve(crossprod(x), crossprod(x,y)) } system.time(for(i in 1:1000) mylm2(EuStockMarkets[,-1], EuStockMarkets[,DAX])) user system elapsed 6.430.006.53 system.time(for(i in 1:1000) mylm1(DAX~., EuStockMarkets)) user system elapsed 16.190.00 16.23 system.time(for(i in 1:1000) lm(DAX~., EuStockMarkets)) user system elapsed 21.430.00 21.44 So if you need to save time, I'd suggest something close to mylm2 rather than mylm1. Kenn On Thu, Feb 19, 2009 at 8:04 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian esmail...@gmail.com wrote: Thanks for the suggestions, I'll have to see if I can figure out how to convert the relatively simple call to lm with an equation and the data file to the functions you mention (or if that's even feasible). X - model.matrix(formula, data) will calculate the X matrix for you. Not an expert in statistics myself, I am mostly concentrating on the programming aspects of R. Problem is that I suspect my colleagues who are providing some guidance with the stats end are not quite experts themselves, and certainly new to R. Cheers, Esmail Kenn Konstabel wrote: lm does lots of computations, some of which you may never need. If speed really matters, you might want to compute only those things you will really use. If you only need coefficients, then using %*%, solve and crossprod will be remarkably faster than lm # repeating someone else's example # lm(DAX~., EuStockMarkets) y - EuStockMarkets[,DAX] x - EuStockMarkets x[,1]-1 colnames(x)[1] - Intercept lm(y ~ x-1) solve(crossprod(x), t(x))%*%y# probably this can be done more efficiently # and a naive timing system.time( for(i in 1:1000) lm(y ~ x-1)) user system elapsed 14.640.33 32.69 system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) ) user system elapsed 0.360.000.36 Also lsfit() is a bit quicker than lm or lm.fit. Regards, Kenn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
Decyphering formulas seems to be the most time consuming part of lm: mylm1 - function(formula, data) { # not perfect but works F - model.frame(formula,data) y - model.response(F) mt - attr(F, terms) x - model.matrix(mt,F) coefs - solve(crossprod(x), crossprod(x,y)) coefs } mylm2 - function(x, y, intercept=TRUE) { if(!is.matrix(x)) x - as.matrix(x) if(intercept) x - cbind(1,x) if(!is.matrix(y)) y - as.matrix(y) solve(crossprod(x), crossprod(x,y)) } system.time(for(i in 1:1000) mylm2(EuStockMarkets[,-1], EuStockMarkets[,DAX])) user system elapsed 6.430.006.53 system.time(for(i in 1:1000) mylm1(DAX~., EuStockMarkets)) user system elapsed 16.190.00 16.23 system.time(for(i in 1:1000) lm(DAX~., EuStockMarkets)) user system elapsed 21.430.00 21.44 So if you need to save time, I'd suggest something close to mylm2 rather than mylm1. Kenn On Thu, Feb 19, 2009 at 8:04 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian esmail...@gmail.com wrote: Thanks for the suggestions, I'll have to see if I can figure out how to convert the relatively simple call to lm with an equation and the data file to the functions you mention (or if that's even feasible). X - model.matrix(formula, data) will calculate the X matrix for you. Not an expert in statistics myself, I am mostly concentrating on the programming aspects of R. Problem is that I suspect my colleagues who are providing some guidance with the stats end are not quite experts themselves, and certainly new to R. Cheers, Esmail Kenn Konstabel wrote: lm does lots of computations, some of which you may never need. If speed really matters, you might want to compute only those things you will really use. If you only need coefficients, then using %*%, solve and crossprod will be remarkably faster than lm # repeating someone else's example # lm(DAX~., EuStockMarkets) y - EuStockMarkets[,DAX] x - EuStockMarkets x[,1]-1 colnames(x)[1] - Intercept lm(y ~ x-1) solve(crossprod(x), t(x))%*%y# probably this can be done more efficiently # and a naive timing system.time( for(i in 1:1000) lm(y ~ x-1)) user system elapsed 14.640.33 32.69 system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) ) user system elapsed 0.360.000.36 Also lsfit() is a bit quicker than lm or lm.fit. Regards, Kenn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
Note that using solve can be numerically unstable for certain problems. On Fri, Feb 20, 2009 at 6:50 AM, Kenn Konstabel lebats...@gmail.com wrote: Decyphering formulas seems to be the most time consuming part of lm: mylm1 - function(formula, data) { # not perfect but works F - model.frame(formula,data) y - model.response(F) mt - attr(F, terms) x - model.matrix(mt,F) coefs - solve(crossprod(x), crossprod(x,y)) coefs } mylm2 - function(x, y, intercept=TRUE) { if(!is.matrix(x)) x - as.matrix(x) if(intercept) x - cbind(1,x) if(!is.matrix(y)) y - as.matrix(y) solve(crossprod(x), crossprod(x,y)) } system.time(for(i in 1:1000) mylm2(EuStockMarkets[,-1], EuStockMarkets[,DAX])) user system elapsed 6.430.006.53 system.time(for(i in 1:1000) mylm1(DAX~., EuStockMarkets)) user system elapsed 16.190.00 16.23 system.time(for(i in 1:1000) lm(DAX~., EuStockMarkets)) user system elapsed 21.430.00 21.44 So if you need to save time, I'd suggest something close to mylm2 rather than mylm1. Kenn On Thu, Feb 19, 2009 at 8:04 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian esmail...@gmail.com wrote: Thanks for the suggestions, I'll have to see if I can figure out how to convert the relatively simple call to lm with an equation and the data file to the functions you mention (or if that's even feasible). X - model.matrix(formula, data) will calculate the X matrix for you. Not an expert in statistics myself, I am mostly concentrating on the programming aspects of R. Problem is that I suspect my colleagues who are providing some guidance with the stats end are not quite experts themselves, and certainly new to R. Cheers, Esmail Kenn Konstabel wrote: lm does lots of computations, some of which you may never need. If speed really matters, you might want to compute only those things you will really use. If you only need coefficients, then using %*%, solve and crossprod will be remarkably faster than lm # repeating someone else's example # lm(DAX~., EuStockMarkets) y - EuStockMarkets[,DAX] x - EuStockMarkets x[,1]-1 colnames(x)[1] - Intercept lm(y ~ x-1) solve(crossprod(x), t(x))%*%y# probably this can be done more efficiently # and a naive timing system.time( for(i in 1:1000) lm(y ~ x-1)) user system elapsed 14.640.33 32.69 system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) ) user system elapsed 0.360.000.36 Also lsfit() is a bit quicker than lm or lm.fit. Regards, Kenn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
Doran, Harold wrote: lm(y ~ x-1) solve(crossprod(x), t(x))%*%y# probably this can be done more efficiently You could do crossprod(x,y) instead of t(x))%*%y that certainly looks more readable (and less error prone) to an R newbie like myself :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
Hi Kenn, Thanks for the suggestions, I'll have to see if I can figure out how to convert the relatively simple call to lm with an equation and the data file to the functions you mention (or if that's even feasible). Not an expert in statistics myself, I am mostly concentrating on the programming aspects of R. Problem is that I suspect my colleagues who are providing some guidance with the stats end are not quite experts themselves, and certainly new to R. Cheers, Esmail Kenn Konstabel wrote: lm does lots of computations, some of which you may never need. If speed really matters, you might want to compute only those things you will really use. If you only need coefficients, then using %*%, solve and crossprod will be remarkably faster than lm # repeating someone else's example # lm(DAX~., EuStockMarkets) y - EuStockMarkets[,DAX] x - EuStockMarkets x[,1]-1 colnames(x)[1] - Intercept lm(y ~ x-1) solve(crossprod(x), t(x))%*%y# probably this can be done more efficiently # and a naive timing system.time( for(i in 1:1000) lm(y ~ x-1)) user system elapsed 14.640.33 32.69 system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) ) user system elapsed 0.360.000.36 Also lsfit() is a bit quicker than lm or lm.fit. Regards, Kenn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
Gabor Grothendieck wrote: On Wed, Feb 18, 2009 at 7:27 AM, Esmail Bonakdarian esmail...@gmail.com wrote: Gabor Grothendieck wrote: See ?Rprof for profiling your R code. If lm is the culprit, rewriting your lm calls using lm.fit might help. Yes, based on my informal benchmarking, lm is the main bottleneck, the rest of the code consists mostly of vector manipulations and control structures. I am not familiar with lm.fit, I'll definitely look it up. I hope it's similar enough to make it easy to substitute one for the other. Thanks for the suggestion, much appreciated. (My runs now take sometimes several hours, it would be great to cut that time down by any amount :-) Yes, the speedup can be significant. e.g. here we cut the time down to 40% of the lm time by using lm.fit and we can get down to nearly 10% if we go even lower level: Wow those numbers look impressive, that would be a nice speedup to have. I took a look at the manual and found the following at the top of the description for lm.fit: These are the basic computing engines called by lm used to fit linear models. These should usually not be used directly unless by experienced users. I am certainly not an experienced user - so I wonder how different it would be to use lm.fit instead of lm. Right now I cobble together an equation and then call lm with it and the datafile. I.e., LM.1 = lm(as.formula(eqn), data=datafile) s=summary(LM.1) I then extract some information from the summary stats. I'm not really quite sure what to make of the parameter list in lm.fit I will look on-line and see if I can find an example showing the use of this - thanks for pointing me in that direction. Esmail system.time(replicate(1000, lm(DAX ~.-1, EuStockMarkets))) user system elapsed 26.850.07 27.35 system.time(replicate(1000, lm.fit(EuStockMarkets[,-1], EuStockMarkets[,1]))) user system elapsed 10.760.00 10.78 system.time(replicate(1000, qr.coef(qr(EuStockMarkets[,-1]), EuStockMarkets[,1]))) user system elapsed 3.330.003.34 lm(DAX ~.-1, EuStockMarkets) Call: lm(formula = DAX ~ . - 1, data = EuStockMarkets) Coefficients: SMI CAC FTSE 0.55156 0.45062 -0.09392 # They call give the same coefficients: lm.fit(EuStockMarkets[,-1], EuStockMarkets[,1])$coef SMI CACFTSE 0.55156141 0.45062183 -0.09391815 qr.coef(qr(EuStockMarkets[,-1]), EuStockMarkets[,1]) SMI CACFTSE 0.55156141 0.45062183 -0.09391815 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian esmail...@gmail.com wrote: Hi Kenn, Thanks for the suggestions, I'll have to see if I can figure out how to convert the relatively simple call to lm with an equation and the data file to the functions you mention (or if that's even feasible). X - model.matrix(formula, data) will calculate the X matrix for you. Not an expert in statistics myself, I am mostly concentrating on the programming aspects of R. Problem is that I suspect my colleagues who are providing some guidance with the stats end are not quite experts themselves, and certainly new to R. Cheers, Esmail Kenn Konstabel wrote: lm does lots of computations, some of which you may never need. If speed really matters, you might want to compute only those things you will really use. If you only need coefficients, then using %*%, solve and crossprod will be remarkably faster than lm # repeating someone else's example # lm(DAX~., EuStockMarkets) y - EuStockMarkets[,DAX] x - EuStockMarkets x[,1]-1 colnames(x)[1] - Intercept lm(y ~ x-1) solve(crossprod(x), t(x))%*%y# probably this can be done more efficiently # and a naive timing system.time( for(i in 1:1000) lm(y ~ x-1)) user system elapsed 14.640.33 32.69 system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) ) user system elapsed 0.360.000.36 Also lsfit() is a bit quicker than lm or lm.fit. Regards, Kenn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
On Tue, Feb 17, 2009 at 6:59 PM, Esmail Bonakdarian esmail...@gmail.com wrote: Well, I have a program written in R which already takes quite a while to run. I was just wondering if I were to rewrite most of the logic in Python - the main thing I use in R are its regression facilities - if it would speed things up. I suspect not since both of them are interpreted, and the bulk of the time is taken up by R's regression calls. See ?Rprof for profiling your R code. If lm is the culprit, rewriting your lm calls using lm.fit might help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
2009/2/17 Esmail Bonakdarian esmail...@gmail.com: Well, I have a program written in R which already takes quite a while to run. I was just wondering if I were to rewrite most of the logic in Python - the main thing I use in R are its regression facilities - if it would speed things up. I suspect not since both of them are interpreted, and the bulk of the time is taken up by R's regression calls. - and the bulk of the time in the regression calls will be taken up by C code in the underlying linear algebra libraries (lapack, blas, atlas and friends). Your best bet for optimisation in this case would be making sure you have the best libraries for your architecture. That's a bit beyond me at the moment, others here can probably tell you about getting the best performing library for your system. This can also speed up Python (scipy or numpy) code that uses the same libraries. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
Gabor Grothendieck wrote: See ?Rprof for profiling your R code. If lm is the culprit, rewriting your lm calls using lm.fit might help. Yes, based on my informal benchmarking, lm is the main bottleneck, the rest of the code consists mostly of vector manipulations and control structures. I am not familiar with lm.fit, I'll definitely look it up. I hope it's similar enough to make it easy to substitute one for the other. Thanks for the suggestion, much appreciated. (My runs now take sometimes several hours, it would be great to cut that time down by any amount :-) Esmail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
On Wed, Feb 18, 2009 at 7:27 AM, Esmail Bonakdarian esmail...@gmail.com wrote: Gabor Grothendieck wrote: See ?Rprof for profiling your R code. If lm is the culprit, rewriting your lm calls using lm.fit might help. Yes, based on my informal benchmarking, lm is the main bottleneck, the rest of the code consists mostly of vector manipulations and control structures. I am not familiar with lm.fit, I'll definitely look it up. I hope it's similar enough to make it easy to substitute one for the other. Thanks for the suggestion, much appreciated. (My runs now take sometimes several hours, it would be great to cut that time down by any amount :-) Yes, the speedup can be significant. e.g. here we cut the time down to 40% of the lm time by using lm.fit and we can get down to nearly 10% if we go even lower level: system.time(replicate(1000, lm(DAX ~.-1, EuStockMarkets))) user system elapsed 26.850.07 27.35 system.time(replicate(1000, lm.fit(EuStockMarkets[,-1], EuStockMarkets[,1]))) user system elapsed 10.760.00 10.78 system.time(replicate(1000, qr.coef(qr(EuStockMarkets[,-1]), EuStockMarkets[,1]))) user system elapsed 3.330.003.34 lm(DAX ~.-1, EuStockMarkets) Call: lm(formula = DAX ~ . - 1, data = EuStockMarkets) Coefficients: SMI CAC FTSE 0.55156 0.45062 -0.09392 # They call give the same coefficients: lm.fit(EuStockMarkets[,-1], EuStockMarkets[,1])$coef SMI CACFTSE 0.55156141 0.45062183 -0.09391815 qr.coef(qr(EuStockMarkets[,-1]), EuStockMarkets[,1]) SMI CACFTSE 0.55156141 0.45062183 -0.09391815 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
lm does lots of computations, some of which you may never need. If speed really matters, you might want to compute only those things you will really use. If you only need coefficients, then using %*%, solve and crossprod will be remarkably faster than lm # repeating someone else's example # lm(DAX~., EuStockMarkets) y - EuStockMarkets[,DAX] x - EuStockMarkets x[,1]-1 colnames(x)[1] - Intercept lm(y ~ x-1) solve(crossprod(x), t(x))%*%y# probably this can be done more efficiently # and a naive timing system.time( for(i in 1:1000) lm(y ~ x-1)) user system elapsed 14.640.33 32.69 system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) ) user system elapsed 0.360.000.36 Also lsfit() is a bit quicker than lm or lm.fit. Regards, Kenn On Wed, Feb 18, 2009 at 2:33 PM, Esmail Bonakdarian esmail...@gmail.comwrote: Barry Rowlingson wrote: - and the bulk of the time in the regression calls will be taken up by C code in the underlying linear algebra libraries (lapack, blas, atlas and friends). ah, good point. Your best bet for optimisation in this case would be making sure you have the best libraries for your architecture. That's a bit beyond me at the moment, others here can probably tell you about getting the best performing library for your system. This can also speed up Python (scipy or numpy) code that uses the same libraries. thanks for the suggestions Barry, I mostly run on intel machines, but using two flavors of Linux and also Windows XP - I grab any machine I can to help run this. R versions range from 2.6.x (Fedora) to 2.8.1 (XP) at the moment. Another post suggested I look at lm.fit in place of lm to help speed things up, so I'm going to look at that next. Appreciate all the helpful posts here. Esmail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
lm(y ~ x-1) solve(crossprod(x), t(x))%*%y# probably this can be done more efficiently You could do crossprod(x,y) instead of t(x))%*%y __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Python and R
Hello all, I am just wondering if any of you are doing most of your scripting with Python instead of R's programming language and then calling the relevant R functions as needed? And if so, what is your experience with this and what sort of software/library do you use in combination with Python to be able to access R's functionality. Is there much of a performance hit either way? (as both are interpreted languages) Thanks, Esmail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
Esmail Bonakdarian wrote: I am just wondering if any of you are doing most of your scripting with Python instead of R's programming language and then calling the relevant R functions as needed? No, but if I wanted to do such a thing, I'd look at Sage: http://sagemath.org/ It'll give you access to a lot more than just R. Is there much of a performance hit either way? (as both are interpreted languages) Are you just asking, or do you have a particular execution time goal, which if exceeded would prevent doing this? I ask because I suspect it's the former, and fast enough is fast enough. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
2009/2/17 Esmail Bonakdarian esmail...@gmail.com: Hello all, I am just wondering if any of you are doing most of your scripting with Python instead of R's programming language and then calling the relevant R functions as needed? I tend to use R in its native form for data analysis and modelling, and python for all my other programming needs (gui stuff with PyQt4, web stuff, text processing etc etc). And if so, what is your experience with this and what sort of software/library do you use in combination with Python to be able to access R's functionality. When I need to use the two together, it's easiest with 'rpy'. This lets you call R functions from python, so you can do: from rpy import r r.hist(z) to get a histogram of the values in a python list 'z'. There are some complications converting structured data types between the two but they can be overcome, and apparently are handled better with the next generation Rpy2 (which I've not got into yet). Google for rpy for info. Is there much of a performance hit either way? (as both are interpreted languages) Not sure what you mean here. Do you mean is: R sum(x) faster than Python sum(x) and how much worse is: Python from rpy import r Python r.sum(x) ? Knuth's remark on premature optimization applies, as ever Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
Hello! On Tue, Feb 17, 2009 at 5:58 PM, Warren Young war...@etr-usa.com wrote: Esmail Bonakdarian wrote: I am just wondering if any of you are doing most of your scripting with Python instead of R's programming language and then calling the relevant R functions as needed? No, but if I wanted to do such a thing, I'd look at Sage: http://sagemath.org/ ah .. thanks for the pointer, I had not heard of Sage, I was just starting to look at SciPy. It'll give you access to a lot more than just R. Is there much of a performance hit either way? (as both are interpreted languages) Are you just asking, or do you have a particular execution time goal, which if exceeded would prevent doing this? I ask because I suspect it's the former, and fast enough is fast enough. I put together a large'ish R program last year, but I think I would be happier if I could code it in say Python - but I would rather not do that at the expense of execution time. Thanks again for telling me about Sage. Esmail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Python and R
On Tue, Feb 17, 2009 at 6:05 PM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: 2009/2/17 Esmail Bonakdarian esmail...@gmail.com: When I need to use the two together, it's easiest with 'rpy'. This lets you call R functions from python, so you can do: from rpy import r r.hist(z) wow .. that is pretty straight forward, I'll have to check out rpy for sure. to get a histogram of the values in a python list 'z'. There are some complications converting structured data types between the two but they can be overcome, and apparently are handled better with the next generation Rpy2 (which I've not got into yet). Google for rpy for info. Will do! Is there much of a performance hit either way? (as both are interpreted languages) Not sure what you mean here. Do you mean is: R sum(x) faster than Python sum(x) and how much worse is: Python from rpy import r Python r.sum(x) Well, I have a program written in R which already takes quite a while to run. I was just wondering if I were to rewrite most of the logic in Python - the main thing I use in R are its regression facilities - if it would speed things up. I suspect not since both of them are interpreted, and the bulk of the time is taken up by R's regression calls. Esmail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.