[R] Python difflib R equivalent?

2011-07-28 Thread Paul
Hi,

Does anyone know of a R library that is equivalent in functionality to
the Python standard libraries' difflib library? The python docs say
this about difflib:

This module provides classes and functions for comparing sequences.
It can be used for example, for comparing files, and can produce
difference information in various formats, including HTML and context
and unified diffs.

http://docs.python.org/library/difflib.html

Thanks,

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python difflib R equivalent?

2011-07-28 Thread Bert Gunter
Paul:

1. I do not know if any such library exists.

2. However, if I understand correctly, one usually does this sort of
thing in R with functions like ?match (or ?%in%) and logical
comparison operations like ?== .  Of course, for numeric
comparisons, you need to be aware of R FAQ 7.31

If you are interested in comparing what in R are character vectors,
then various string operators, e.g. ?grep, ?regexp and character
operation packages (e.g. gsubfn, stringr) may be of use.

As usual, you are more likely to receive a helpful answer if you do as
the posting guide requests and provide a small, reproducible example
of the sort(s) of thing(s) you want to do.

Cheers,
Bert

On Thu, Jul 28, 2011 at 4:05 AM, Paul newzealandsp...@gmail.com wrote:
 Hi,

 Does anyone know of a R library that is equivalent in functionality to
 the Python standard libraries' difflib library? The python docs say
 this about difflib:

 This module provides classes and functions for comparing sequences.
 It can be used for example, for comparing files, and can produce
 difference information in various formats, including HTML and context
 and unified diffs.

 http://docs.python.org/library/difflib.html

 Thanks,

 Paul

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python difflib R equivalent? -- Correction

2011-07-28 Thread Bert Gunter
Item 1 below should be changed to:

 1. I do not know if any such PACKAGE exists.

(A library in R is a file directory where R packages are stored)

-- Bert

On Thu, Jul 28, 2011 at 8:05 AM, Bert Gunter bgun...@gene.com wrote:
 Paul:

 1. I do not know if any such library exists.

 2. However, if I understand correctly, one usually does this sort of
 thing in R with functions like ?match (or ?%in%) and logical
 comparison operations like ?== .  Of course, for numeric
 comparisons, you need to be aware of R FAQ 7.31

 If you are interested in comparing what in R are character vectors,
 then various string operators, e.g. ?grep, ?regexp and character
 operation packages (e.g. gsubfn, stringr) may be of use.

 As usual, you are more likely to receive a helpful answer if you do as
 the posting guide requests and provide a small, reproducible example
 of the sort(s) of thing(s) you want to do.

 Cheers,
 Bert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python difflib R equivalent?

2011-07-28 Thread Prof Brian Ripley

On Thu, 28 Jul 2011, Bert Gunter wrote:


Paul:

1. I do not know if any such library exists.


Not to my knowledge, and we have contemplated providing such 
functions.  But for files see e.g. tools::Rdiff, and generally R will 
not be a good way to do this sort of thing on files (since the 
flexibility of R's i/o via connections does have a cost)



2. However, if I understand correctly, one usually does this sort of
thing in R with functions like ?match (or ?%in%) and logical
comparison operations like ?== .  Of course, for numeric
comparisons, you need to be aware of R FAQ 7.31

If you are interested in comparing what in R are character vectors,
then various string operators, e.g. ?grep, ?regexp and character
operation packages (e.g. gsubfn, stringr) may be of use.

As usual, you are more likely to receive a helpful answer if you do as
the posting guide requests and provide a small, reproducible example
of the sort(s) of thing(s) you want to do.


Or at least a URI of examples you wish to emulate, which for difflib 
seem to be the sort of thing POSIX diff and friends are used for.



Cheers,
Bert

On Thu, Jul 28, 2011 at 4:05 AM, Paul newzealandsp...@gmail.com wrote:

Hi,

Does anyone know of a R library that is equivalent in functionality to
the Python standard libraries' difflib library? The python docs say
this about difflib:

This module provides classes and functions for comparing sequences.
It can be used for example, for comparing files, and can produce
difference information in various formats, including HTML and context
and unified diffs.

http://docs.python.org/library/difflib.html

Thanks,

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python difflib R equivalent?

2011-07-28 Thread Spencer Graves
There is a package rJython, which claims to provide an R interface to 
Python via Jython.  I haven't used it, but the lead author, Gabor 
Grothendieck, is well known in the R community.  Spencer



On 7/28/2011 9:15 AM, Prof Brian Ripley wrote:

On Thu, 28 Jul 2011, Bert Gunter wrote:


Paul:

1. I do not know if any such library exists.


Not to my knowledge, and we have contemplated providing such 
functions.  But for files see e.g. tools::Rdiff, and generally R will 
not be a good way to do this sort of thing on files (since the 
flexibility of R's i/o via connections does have a cost)



2. However, if I understand correctly, one usually does this sort of
thing in R with functions like ?match (or ?%in%) and logical
comparison operations like ?== .  Of course, for numeric
comparisons, you need to be aware of R FAQ 7.31

If you are interested in comparing what in R are character vectors,
then various string operators, e.g. ?grep, ?regexp and character
operation packages (e.g. gsubfn, stringr) may be of use.

As usual, you are more likely to receive a helpful answer if you do as
the posting guide requests and provide a small, reproducible example
of the sort(s) of thing(s) you want to do.


Or at least a URI of examples you wish to emulate, which for difflib 
seem to be the sort of thing POSIX diff and friends are used for.



Cheers,
Bert

On Thu, Jul 28, 2011 at 4:05 AM, Paul newzealandsp...@gmail.com wrote:

Hi,

Does anyone know of a R library that is equivalent in functionality to
the Python standard libraries' difflib library? The python docs say
this about difflib:

This module provides classes and functions for comparing sequences.
It can be used for example, for comparing files, and can produce
difference information in various formats, including HTML and context
and unified diffs.

http://docs.python.org/library/difflib.html

Thanks,

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





--
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.






--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-21 Thread Douglas Bates
Different methods of performing least squares calculations in R are discussed in

@Article{Rnews:Bates:2004,
  author   = {Douglas Bates},
  title= {Least Squares Calculations in {R}},
  journal  = {R News},
  year = 2004,
  volume   = 4,
  number   = 1,
  pages= {17--20},
  month= {June},
  url  = http,
  pdf  = Rnews2004-1
}

Some of the functions mentioned in that article have been modified.  A
more up-to-date version of the comparisons in that article is
available as the Comparisons vignette in the Matrix package.

On Fri, Feb 20, 2009 at 6:06 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 Note that using solve can be numerically unstable for certain problems.

 On Fri, Feb 20, 2009 at 6:50 AM, Kenn Konstabel lebats...@gmail.com wrote:
 Decyphering formulas seems to be the most time consuming part of lm:

 mylm1 - function(formula, data) {
 # not perfect but works
 F - model.frame(formula,data)
 y - model.response(F)
 mt - attr(F, terms)
 x - model.matrix(mt,F)
 coefs - solve(crossprod(x), crossprod(x,y))
 coefs
 }

 mylm2 - function(x, y, intercept=TRUE) {
 if(!is.matrix(x)) x - as.matrix(x)
 if(intercept) x - cbind(1,x)
 if(!is.matrix(y)) y - as.matrix(y)
 solve(crossprod(x), crossprod(x,y))
 }

 system.time(for(i in 1:1000) mylm2(EuStockMarkets[,-1],
 EuStockMarkets[,DAX]))
user  system elapsed
6.430.006.53
 system.time(for(i in 1:1000) mylm1(DAX~., EuStockMarkets))
user  system elapsed
   16.190.00   16.23
 system.time(for(i in 1:1000) lm(DAX~., EuStockMarkets))
user  system elapsed
   21.430.00   21.44

 So if you need to save time, I'd suggest something close to mylm2 rather
 than mylm1.

 Kenn



 On Thu, Feb 19, 2009 at 8:04 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:

 On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian esmail...@gmail.com
 wrote:

  Thanks for the suggestions, I'll have to see if I can figure out how to
  convert the relatively simple call to lm with an equation and the data
  file
  to the functions you mention (or if that's even feasible).

 X - model.matrix(formula, data)

 will calculate the X matrix for you.

 
  Not an expert in statistics myself, I am mostly concentrating on the
  programming aspects of R. Problem is that I suspect my colleagues who
  are providing some guidance with the stats end are not quite experts
  themselves, and certainly new to R.
 
  Cheers,
 
  Esmail
 
  Kenn Konstabel wrote:
 
  lm does lots of computations, some of which you may never need. If
  speed
  really matters, you might want to compute only those things you will
  really
  use. If you only need coefficients, then using %*%, solve and crossprod
  will
  be remarkably faster than lm
 
  # repeating someone else's example
  # lm(DAX~., EuStockMarkets)
 
   y - EuStockMarkets[,DAX]
   x - EuStockMarkets
   x[,1]-1
  colnames(x)[1] - Intercept
 
  lm(y ~ x-1)
  solve(crossprod(x), t(x))%*%y# probably this can be done more
  efficiently
 
  # and a naive timing
 
system.time( for(i in 1:1000) lm(y ~ x-1))
user  system elapsed
   14.640.33   32.69
system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) )
user  system elapsed
0.360.000.36
 
 
  Also lsfit() is a bit quicker than lm or lm.fit.
 
  Regards,
  Kenn
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-20 Thread Kenn Konstabel
Decyphering formulas seems to be the most time consuming part of lm:

mylm1 - function(formula, data) {
# not perfect but works
F - model.frame(formula,data)
y - model.response(F)
mt - attr(F, terms)
x - model.matrix(mt,F)
coefs - solve(crossprod(x), crossprod(x,y))
coefs
}

mylm2 - function(x, y, intercept=TRUE) {
if(!is.matrix(x)) x - as.matrix(x)
if(intercept) x - cbind(1,x)
if(!is.matrix(y)) y - as.matrix(y)
solve(crossprod(x), crossprod(x,y))
}

 system.time(for(i in 1:1000) mylm2(EuStockMarkets[,-1],
EuStockMarkets[,DAX]))
   user  system elapsed
   6.430.006.53
 system.time(for(i in 1:1000) mylm1(DAX~., EuStockMarkets))
   user  system elapsed
  16.190.00   16.23
 system.time(for(i in 1:1000) lm(DAX~., EuStockMarkets))
   user  system elapsed
  21.430.00   21.44

So if you need to save time, I'd suggest something close to mylm2 rather
than mylm1.

Kenn



On Thu, Feb 19, 2009 at 8:04 PM, Gabor Grothendieck ggrothendi...@gmail.com
 wrote:

 On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian esmail...@gmail.com
 wrote:

 Thanks for the suggestions, I'll have to see if I can figure out how to
  convert the relatively simple call to lm with an equation and the data
 file
  to the functions you mention (or if that's even feasible).

 X - model.matrix(formula, data)

 will calculate the X matrix for you.

 
  Not an expert in statistics myself, I am mostly concentrating on the
  programming aspects of R. Problem is that I suspect my colleagues who
  are providing some guidance with the stats end are not quite experts
  themselves, and certainly new to R.
 
  Cheers,
 
  Esmail
 
  Kenn Konstabel wrote:
 
  lm does lots of computations, some of which you may never need. If speed
  really matters, you might want to compute only those things you will
 really
  use. If you only need coefficients, then using %*%, solve and crossprod
 will
  be remarkably faster than lm
 
  # repeating someone else's example
  # lm(DAX~., EuStockMarkets)
 
   y - EuStockMarkets[,DAX]
   x - EuStockMarkets
   x[,1]-1
  colnames(x)[1] - Intercept
 
  lm(y ~ x-1)
  solve(crossprod(x), t(x))%*%y# probably this can be done more
  efficiently
 
  # and a naive timing
 
system.time( for(i in 1:1000) lm(y ~ x-1))
user  system elapsed
   14.640.33   32.69
system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) )
user  system elapsed
0.360.000.36
 
 
  Also lsfit() is a bit quicker than lm or lm.fit.
 
  Regards,
  Kenn
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-20 Thread Gabor Grothendieck
Note that using solve can be numerically unstable for certain problems.

On Fri, Feb 20, 2009 at 6:50 AM, Kenn Konstabel lebats...@gmail.com wrote:
 Decyphering formulas seems to be the most time consuming part of lm:

 mylm1 - function(formula, data) {
 # not perfect but works
 F - model.frame(formula,data)
 y - model.response(F)
 mt - attr(F, terms)
 x - model.matrix(mt,F)
 coefs - solve(crossprod(x), crossprod(x,y))
 coefs
 }

 mylm2 - function(x, y, intercept=TRUE) {
 if(!is.matrix(x)) x - as.matrix(x)
 if(intercept) x - cbind(1,x)
 if(!is.matrix(y)) y - as.matrix(y)
 solve(crossprod(x), crossprod(x,y))
 }

 system.time(for(i in 1:1000) mylm2(EuStockMarkets[,-1],
 EuStockMarkets[,DAX]))
user  system elapsed
6.430.006.53
 system.time(for(i in 1:1000) mylm1(DAX~., EuStockMarkets))
user  system elapsed
   16.190.00   16.23
 system.time(for(i in 1:1000) lm(DAX~., EuStockMarkets))
user  system elapsed
   21.430.00   21.44

 So if you need to save time, I'd suggest something close to mylm2 rather
 than mylm1.

 Kenn



 On Thu, Feb 19, 2009 at 8:04 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:

 On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian esmail...@gmail.com
 wrote:

  Thanks for the suggestions, I'll have to see if I can figure out how to
  convert the relatively simple call to lm with an equation and the data
  file
  to the functions you mention (or if that's even feasible).

 X - model.matrix(formula, data)

 will calculate the X matrix for you.

 
  Not an expert in statistics myself, I am mostly concentrating on the
  programming aspects of R. Problem is that I suspect my colleagues who
  are providing some guidance with the stats end are not quite experts
  themselves, and certainly new to R.
 
  Cheers,
 
  Esmail
 
  Kenn Konstabel wrote:
 
  lm does lots of computations, some of which you may never need. If
  speed
  really matters, you might want to compute only those things you will
  really
  use. If you only need coefficients, then using %*%, solve and crossprod
  will
  be remarkably faster than lm
 
  # repeating someone else's example
  # lm(DAX~., EuStockMarkets)
 
   y - EuStockMarkets[,DAX]
   x - EuStockMarkets
   x[,1]-1
  colnames(x)[1] - Intercept
 
  lm(y ~ x-1)
  solve(crossprod(x), t(x))%*%y# probably this can be done more
  efficiently
 
  # and a naive timing
 
system.time( for(i in 1:1000) lm(y ~ x-1))
user  system elapsed
   14.640.33   32.69
system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) )
user  system elapsed
0.360.000.36
 
 
  Also lsfit() is a bit quicker than lm or lm.fit.
 
  Regards,
  Kenn
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-19 Thread Esmail Bonakdarian

Doran, Harold wrote:

lm(y ~ x-1)
solve(crossprod(x), t(x))%*%y# probably this can be done more
efficiently



You could do

crossprod(x,y) instead of t(x))%*%y



that certainly looks more readable (and less error prone) to an R newbie
like myself :-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-19 Thread Esmail Bonakdarian

Hi Kenn,

Thanks for the suggestions, I'll have to see if I can figure out how to
convert the relatively simple call to lm with an equation and the data file
to the functions you mention (or if that's even feasible).

Not an expert in statistics myself, I am mostly concentrating on the
programming aspects of R. Problem is that I suspect my colleagues who
are providing some guidance with the stats end are not quite experts
themselves, and certainly new to R.

Cheers,

Esmail

Kenn Konstabel wrote:
lm does lots of computations, some of which you may never need. If speed 
really matters, you might want to compute only those things you will 
really use. If you only need coefficients, then using %*%, solve and 
crossprod will be remarkably faster than lm


# repeating someone else's example
# lm(DAX~., EuStockMarkets)

 y - EuStockMarkets[,DAX]
 x - EuStockMarkets
 x[,1]-1
colnames(x)[1] - Intercept

lm(y ~ x-1)
solve(crossprod(x), t(x))%*%y# probably this can be done more 
efficiently


# and a naive timing

  system.time( for(i in 1:1000) lm(y ~ x-1))
   user  system elapsed
  14.640.33   32.69
  system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) )
   user  system elapsed
   0.360.000.36


Also lsfit() is a bit quicker than lm or lm.fit.

Regards,
Kenn


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-19 Thread Esmail Bonakdarian

Gabor Grothendieck wrote:

On Wed, Feb 18, 2009 at 7:27 AM, Esmail Bonakdarian esmail...@gmail.com wrote:

Gabor Grothendieck wrote:


See ?Rprof for profiling your R code.

If lm is the culprit, rewriting your lm calls using lm.fit might help.

Yes, based on my informal benchmarking, lm is the main bottleneck, the
rest
of the code consists mostly of vector manipulations and control structures.

I am not familiar with lm.fit, I'll definitely look it up. I hope it's
similar
enough to make it easy to substitute one for the other.

Thanks for the suggestion, much appreciated. (My runs now take sometimes
several hours, it would be great to cut that time down by any amount :-)



Yes, the speedup can be significant.  e.g. here we cut the time down to
40% of the lm time by using lm.fit and we can get down to nearly 10% if
we go even lower level:


Wow those numbers look impressive, that would be a nice speedup to have.

I took a look at the manual and found the following at the top of
the description for lm.fit:

  These are the basic computing engines called by lm used to fit linear
   models. These should usually not be used directly unless by experienced
   users. 

I am certainly not an experienced user - so I wonder how different it
would be to use lm.fit instead of lm.

Right now I cobble together an equation and then call lm with it and the
datafile.

I.e.,

LM.1 = lm(as.formula(eqn), data=datafile)
s=summary(LM.1)

I then extract some information from the summary stats.

I'm not really quite sure what to make of the parameter list in lm.fit

I will look on-line and see if I can find an example showing the use of
this - thanks for pointing me in that direction.

Esmail


system.time(replicate(1000, lm(DAX ~.-1, EuStockMarkets)))

   user  system elapsed
  26.850.07   27.35

system.time(replicate(1000, lm.fit(EuStockMarkets[,-1], EuStockMarkets[,1])))

   user  system elapsed
  10.760.00   10.78

system.time(replicate(1000, qr.coef(qr(EuStockMarkets[,-1]), 
EuStockMarkets[,1])))

   user  system elapsed
   3.330.003.34

lm(DAX ~.-1, EuStockMarkets)


Call:
lm(formula = DAX ~ . - 1, data = EuStockMarkets)

Coefficients:
 SMI   CAC  FTSE
 0.55156   0.45062  -0.09392


# They call give the same coefficients:



lm.fit(EuStockMarkets[,-1], EuStockMarkets[,1])$coef

SMI CACFTSE
 0.55156141  0.45062183 -0.09391815

qr.coef(qr(EuStockMarkets[,-1]), EuStockMarkets[,1])

SMI CACFTSE
 0.55156141  0.45062183 -0.09391815



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-19 Thread Gabor Grothendieck
On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian esmail...@gmail.com wrote:
 Hi Kenn,

 Thanks for the suggestions, I'll have to see if I can figure out how to
 convert the relatively simple call to lm with an equation and the data file
 to the functions you mention (or if that's even feasible).

X - model.matrix(formula, data)

will calculate the X matrix for you.


 Not an expert in statistics myself, I am mostly concentrating on the
 programming aspects of R. Problem is that I suspect my colleagues who
 are providing some guidance with the stats end are not quite experts
 themselves, and certainly new to R.

 Cheers,

 Esmail

 Kenn Konstabel wrote:

 lm does lots of computations, some of which you may never need. If speed
 really matters, you might want to compute only those things you will really
 use. If you only need coefficients, then using %*%, solve and crossprod will
 be remarkably faster than lm

 # repeating someone else's example
 # lm(DAX~., EuStockMarkets)

  y - EuStockMarkets[,DAX]
  x - EuStockMarkets
  x[,1]-1
 colnames(x)[1] - Intercept

 lm(y ~ x-1)
 solve(crossprod(x), t(x))%*%y# probably this can be done more
 efficiently

 # and a naive timing

   system.time( for(i in 1:1000) lm(y ~ x-1))
   user  system elapsed
  14.640.33   32.69
   system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) )
   user  system elapsed
   0.360.000.36


 Also lsfit() is a bit quicker than lm or lm.fit.

 Regards,
 Kenn

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-18 Thread Gabor Grothendieck
On Tue, Feb 17, 2009 at 6:59 PM, Esmail Bonakdarian esmail...@gmail.com wrote:
 Well, I have a program written in R which already takes quite a while
 to run. I was
 just wondering if I were to rewrite most of the logic in Python - the
 main thing I use
 in R are its regression facilities - if it would speed things up. I
 suspect not since
 both of them are interpreted, and the bulk of the time is taken up by
 R's regression
 calls.

See ?Rprof for profiling your R code.

If lm is the culprit, rewriting your lm calls using lm.fit might help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-18 Thread Barry Rowlingson
2009/2/17 Esmail Bonakdarian esmail...@gmail.com:

 Well, I have a program written in R which already takes quite a while
 to run. I was
 just wondering if I were to rewrite most of the logic in Python - the
 main thing I use
 in R are its regression facilities - if it would speed things up. I
 suspect not since
 both of them are interpreted, and the bulk of the time is taken up by
 R's regression
 calls.

 - and the bulk of the time in the regression calls will be taken up
by C code in the underlying linear algebra libraries (lapack, blas,
atlas and friends).

 Your best bet for optimisation in this case would be making sure you
have the best libraries for your architecture. That's a bit beyond me
at the moment, others here can probably tell you about getting the
best performing library for your system.

 This can also speed up Python (scipy or numpy) code that uses the
same libraries.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-18 Thread Esmail Bonakdarian

Gabor Grothendieck wrote:



See ?Rprof for profiling your R code.

If lm is the culprit, rewriting your lm calls using lm.fit might help.


Yes, based on my informal benchmarking, lm is the main bottleneck, the rest
of the code consists mostly of vector manipulations and control structures.

I am not familiar with lm.fit, I'll definitely look it up. I hope it's similar
enough to make it easy to substitute one for the other.

Thanks for the suggestion, much appreciated. (My runs now take sometimes
several hours, it would be great to cut that time down by any amount :-)

Esmail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-18 Thread Gabor Grothendieck
On Wed, Feb 18, 2009 at 7:27 AM, Esmail Bonakdarian esmail...@gmail.com wrote:
 Gabor Grothendieck wrote:


 See ?Rprof for profiling your R code.

 If lm is the culprit, rewriting your lm calls using lm.fit might help.

 Yes, based on my informal benchmarking, lm is the main bottleneck, the
 rest
 of the code consists mostly of vector manipulations and control structures.

 I am not familiar with lm.fit, I'll definitely look it up. I hope it's
 similar
 enough to make it easy to substitute one for the other.

 Thanks for the suggestion, much appreciated. (My runs now take sometimes
 several hours, it would be great to cut that time down by any amount :-)


Yes, the speedup can be significant.  e.g. here we cut the time down to
40% of the lm time by using lm.fit and we can get down to nearly 10% if
we go even lower level:

 system.time(replicate(1000, lm(DAX ~.-1, EuStockMarkets)))
   user  system elapsed
  26.850.07   27.35

 system.time(replicate(1000, lm.fit(EuStockMarkets[,-1], EuStockMarkets[,1])))
   user  system elapsed
  10.760.00   10.78

 system.time(replicate(1000, qr.coef(qr(EuStockMarkets[,-1]), 
 EuStockMarkets[,1])))
   user  system elapsed
   3.330.003.34

 lm(DAX ~.-1, EuStockMarkets)

Call:
lm(formula = DAX ~ . - 1, data = EuStockMarkets)

Coefficients:
 SMI   CAC  FTSE
 0.55156   0.45062  -0.09392

 # They call give the same coefficients:

 lm.fit(EuStockMarkets[,-1], EuStockMarkets[,1])$coef
SMI CACFTSE
 0.55156141  0.45062183 -0.09391815

 qr.coef(qr(EuStockMarkets[,-1]), EuStockMarkets[,1])
SMI CACFTSE
 0.55156141  0.45062183 -0.09391815

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-18 Thread Kenn Konstabel
lm does lots of computations, some of which you may never need. If speed
really matters, you might want to compute only those things you will really
use. If you only need coefficients, then using %*%, solve and crossprod will
be remarkably faster than lm

# repeating someone else's example
# lm(DAX~., EuStockMarkets)

 y - EuStockMarkets[,DAX]
 x - EuStockMarkets
 x[,1]-1
colnames(x)[1] - Intercept

lm(y ~ x-1)
solve(crossprod(x), t(x))%*%y# probably this can be done more
efficiently

# and a naive timing

 system.time( for(i in 1:1000) lm(y ~ x-1))
   user  system elapsed
  14.640.33   32.69
 system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) )
   user  system elapsed
   0.360.000.36


Also lsfit() is a bit quicker than lm or lm.fit.

Regards,
Kenn



On Wed, Feb 18, 2009 at 2:33 PM, Esmail Bonakdarian esmail...@gmail.comwrote:

 Barry Rowlingson wrote:



  - and the bulk of the time in the regression calls will be taken up
 by C code in the underlying linear algebra libraries (lapack, blas,
 atlas and friends).


 ah, good point.

   Your best bet for optimisation in this case would be making sure you
 have the best libraries for your architecture. That's a bit beyond me
 at the moment, others here can probably tell you about getting the
 best performing library for your system.

  This can also speed up Python (scipy or numpy) code that uses the
 same libraries.


 thanks for the suggestions Barry, I mostly run on intel machines, but
 using two flavors of Linux and also Windows XP - I grab any machine I can
 to
 help run this. R versions range from 2.6.x (Fedora) to 2.8.1 (XP) at the
 moment.

 Another post suggested I look at lm.fit in place of lm to help speed things
 up, so I'm going to look at that next.

 Appreciate all the helpful posts here.

 Esmail


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-18 Thread Doran, Harold
 lm(y ~ x-1)
 solve(crossprod(x), t(x))%*%y# probably this can be done more
 efficiently


You could do

crossprod(x,y) instead of t(x))%*%y

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Python and R

2009-02-17 Thread Esmail Bonakdarian

Hello all,

I am just wondering if any of you are doing most of your scripting
with Python instead of R's programming language and then calling
the relevant R functions as needed?

And if so, what is your experience with this and what sort of
software/library  do you use in combination with Python to be able
to access R's functionality.

Is there much of a performance hit either way? (as both are interpreted
languages)

Thanks,
Esmail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-17 Thread Warren Young

Esmail Bonakdarian wrote:


I am just wondering if any of you are doing most of your scripting
with Python instead of R's programming language and then calling
the relevant R functions as needed?


No, but if I wanted to do such a thing, I'd look at Sage: 
http://sagemath.org/


It'll give you access to a lot more than just R.


Is there much of a performance hit either way? (as both are interpreted
languages)


Are you just asking, or do you have a particular execution time goal, 
which if exceeded would prevent doing this?  I ask because I suspect 
it's the former, and fast enough is fast enough.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-17 Thread Barry Rowlingson
2009/2/17 Esmail Bonakdarian esmail...@gmail.com:
 Hello all,

 I am just wondering if any of you are doing most of your scripting
 with Python instead of R's programming language and then calling
 the relevant R functions as needed?

 I tend to use R in its native form for data analysis and modelling,
and python for all my other programming needs (gui stuff with PyQt4,
web stuff, text processing etc etc).

 And if so, what is your experience with this and what sort of
 software/library  do you use in combination with Python to be able
 to access R's functionality.

 When I need to use the two together, it's easiest with 'rpy'. This
lets you call R functions from python, so you can do:

 from rpy import r
 r.hist(z)

to get a histogram of the values in a python list 'z'. There are some
complications converting structured data types between the two but
they can be overcome, and apparently are handled better with the next
generation Rpy2 (which I've not got into yet). Google for rpy for
info.

 Is there much of a performance hit either way? (as both are interpreted
 languages)

 Not sure what you mean here. Do you mean is:

 R sum(x)

faster than

Python sum(x)

and how much worse is:

Python from rpy import r
Python r.sum(x)

?

 Knuth's remark on premature optimization applies, as ever

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-17 Thread Esmail Bonakdarian
Hello!

On Tue, Feb 17, 2009 at 5:58 PM, Warren Young war...@etr-usa.com wrote:

 Esmail Bonakdarian wrote:

 I am just wondering if any of you are doing most of your scripting
 with Python instead of R's programming language and then calling
 the relevant R functions as needed?

 No, but if I wanted to do such a thing, I'd look at Sage: http://sagemath.org/

ah .. thanks for the pointer, I had not heard of Sage, I was just
starting to look at
SciPy.

 It'll give you access to a lot more than just R.

 Is there much of a performance hit either way? (as both are interpreted
 languages)

 Are you just asking, or do you have a particular execution time goal, which 
 if exceeded
 would prevent doing this?  I ask because I suspect it's the former, and fast 
 enough is fast
 enough.

I put together a large'ish R program last year, but I think I would be
happier if I could code
it in say Python - but I would rather not do that at the expense of
execution time.

Thanks again for telling me about Sage.

Esmail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Python and R

2009-02-17 Thread Esmail Bonakdarian
On Tue, Feb 17, 2009 at 6:05 PM, Barry Rowlingson
b.rowling...@lancaster.ac.uk wrote:
 2009/2/17 Esmail Bonakdarian esmail...@gmail.com:

  When I need to use the two together, it's easiest with 'rpy'. This
 lets you call R functions from python, so you can do:

  from rpy import r
  r.hist(z)

wow .. that is pretty straight forward, I'll have to check out rpy for sure.

 to get a histogram of the values in a python list 'z'. There are some
 complications converting structured data types between the two but
 they can be overcome, and apparently are handled better with the next
 generation Rpy2 (which I've not got into yet). Google for rpy for
 info.

Will do!

 Is there much of a performance hit either way? (as both are interpreted
 languages)

  Not sure what you mean here. Do you mean is:

  R sum(x)

 faster than

 Python sum(x)

 and how much worse is:

 Python from rpy import r
 Python r.sum(x)


Well, I have a program written in R which already takes quite a while
to run. I was
just wondering if I were to rewrite most of the logic in Python - the
main thing I use
in R are its regression facilities - if it would speed things up. I
suspect not since
both of them are interpreted, and the bulk of the time is taken up by
R's regression
calls.

Esmail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.