Re: [R] An "R is slow"-article

2008-01-10 Thread Tom Backer Johnsen
Gustaf Rydevik wrote:
> Hi all,
> 
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/0172.htm
> 
> It does seem interesting that the C execution is that much slower from
> R than from a native C program. Could any of the more technically
> knowledgeable people explain why this is so?
> 
> The author also have some thought-provoking opinions on R being
> no-good and that you should write everything in C instead (mainly
> because R is slow and too good at graphics, encouraging data
> snooping). See  http://fluff.info/blog/arch/0041.htm
>  While I don't agree (granted, I can't really write C), it was
> interesting to read something from a very different perspective than
> I'm used to.

The important aspect of R is not that it is less fast for a particular
kind of operation than a dedicated  program written in a compiled
language like C, Pascal, or Fortran for a particular kind of analysis.
  That is not really surprising, and not relevant for anything but the
most extreme situations given the speed (and low price) of modern
computers.

What is really relevant is (a) the context of any operation, R is a
well documented language where a very large number number of
operations may be combined in an extremely large number of ways where
the probability of errors is very low, and (b) all aspects of the
language is peer reviewed.

Both points are extremely important in any research context, where
everything, including the software used in computations, should be
possible to document.  These qualities are difficult to achieve in
homebrewed programs.  Therefore one should not resort to programming
anything on your own unless the operations you need are definitely not
present in the language you are using.  Apart from that, you have to
think about cost in respect to the time and resources used to develop
your own substitutes for something that already exists.

He also says that R encourages "fishing trips" in the data.  Well,
that may be somewhat true for R as well as any of the major
statistical packages.  But that is a problem that really is in a
different domain, one of attitudes on how to do research in the first
place.

Tom
> 
> Best regards,
> 
> Gustaf
> 
> _
> Department of Epidemiology,
> Swedish Institute for Infectious Disease Control
> work email: gustaf.rydevik at smi dot ki dot se
> skype:gustaf_rydevik
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
++
| Tom Backer Johnsen, Psychometrics Unit,  Faculty of Psychology |
| University of Bergen, Christies gt. 12, N-5015 Bergen,  NORWAY |
| Tel : +47-5558-9185Fax : +47-5558-9879 |
| Email : [EMAIL PROTECTED]URL : http://www.galton.uib.no/ |
++

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] An "R is slow"-article

2008-01-09 Thread Michael A. Miller
> "Paul" == Paul Gilbert <[EMAIL PROTECTED]> writes:

> Gustaf Rydevik wrote:

>> The author also have some thought-provoking opinions on R
>> being no-good and that you should write everything in C

> People used to say assembler, that's progress.

>From the FORTRAN Preliminary Report, IBM, November 1954:

  "FORTRAN should virtually eliminate coding and debugging."

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] An "R is slow"-article

2008-01-09 Thread Philippe Grosjean
Jeffrey Horner wrote:
> I hazard to say that the author of that blog post isn't using the time 
> he saved from writing his analyses in C very efficiently. I wonder how 
> long it took him to write it in C in the first place, even to setup the 
> testing of C against R, or to write the blog post.
> 
> He didn't say.
> 
> Jeff

Yes, he did: he said he took the C code out of the fisher.test() R 
function and just did a little bit of tweaking. So, it did not took too 
long to reuse existing C code someone else wrote and debugged, for 
sure... But, as you suggest, what about writing it from scratch in C or 
R, that is the real question, of course!

Philippe

> Armstrong, Whit wrote on 01/09/2008 09:49 AM:
>> fisher.test seems to use the .C calling convention in a couple of
>> different places.
>>
>> for example:
>>
>> tmp <- .C("fisher_sim", as.integer(nr), as.integer(nc), 
>> as.integer(sr), as.integer(sc), as.integer(n), 
>> as.integer(B), integer(nr * nc), double(n + 1), 
>> integer(nc), results = double(B), PACKAGE =
>> "stats")$results
>>
>>
>> perhaps some R experts on the list can tell us whether there is
>> significant overhead to .C vs .Call.
>>
>> Does .C really duplicate its arguments?  What does RObjToCPtr do?
>>
>>
>> (line 1682.. in dotcode.c)
>>
>> /* Convert the arguments for use in foreign */
>> /* function calls.  Note that we copy twice */
>> /* once here, on the way into the call, and */
>> /* once below on the way out. */
>> cargs = (void**)R_alloc(nargs, sizeof(void*));
>> nargs = 0;
>> for(pargs = args ; pargs != R_NilValue; pargs = CDR(pargs)) {
>> #ifdef THROW_REGISTRATION_TYPE_ERROR
>> if(checkTypes &&
>>!comparePrimitiveTypes(checkTypes[nargs], CAR(pargs), dup)) {
>> /* We can loop over all the arguments and report all the
>>
>>erroneous ones, but then we would also want to avoid
>>
>>the conversions.  Also, in the future, we may just
>>
>>attempt to coerce the value to the appropriate
>>
>>type. This is why we pass the checkTypes[nargs] value
>>
>>to RObjToCPtr(). We just have to sort out the ability
>>
>>to return the correct value which is complicated by
>>
>>dup, etc. */
>> errorcall(call, _("Wrong type for argument %d in call to
>> %s"),
>>   nargs+1, symName);
>> }
>> #endif
>> cargs[nargs] = RObjToCPtr(CAR(pargs), naok, dup, nargs + 1,
>>   which, symName, argConverters + nargs,
>>   checkTypes ? checkTypes[nargs] : 0,
>>   encname);
>> #ifdef R_MEMORY_PROFILING
>> if (TRACE(CAR(pargs)) && dup)
>> memtrace_report(CAR(pargs), cargs[nargs]);
>> #endif
>> nargs++;
>> }
>>
>> Thanks,
>> Whit
>>
>>
>>> -Original Message-
>>> From: [EMAIL PROTECTED] 
>>> [mailto:[EMAIL PROTECTED] On Behalf Of Gustaf Rydevik
>>> Sent: Wednesday, January 09, 2008 10:25 AM
>>> To: r-help@r-project.org
>>> Subject: [R] An "R is slow"-article
>>>
>>> Hi all,
>>>
>>> Reading the wikipedia page on R, I stumbled across the following:
>>> http://fluff.info/blog/arch/0172.htm
>>>
>>> It does seem interesting that the C execution is that much 
>>> slower from R than from a native C program. Could any of the 
>>> more technically knowledgeable people explain why this is so?
>>>
>>> The author also have some thought-provoking opinions on R 
>>> being no-good and that you should write everything in C 
>>> instead (mainly because R is slow and too good at graphics, 
>>> encouraging data snooping). See  
>>> http://fluff.info/blog/arch/0041.htm
>>>  While I don't agree (granted, I can't really write C), it 
>>> was interesting to read something from a very different 
>>> perspective than I'm used to.
>>>
>>> Best regards,
>>>
>>> Gustaf
>>>
>>> _
>>> Department of Epidemiology,
>>> Swedish Institute for Infectious Disease Control work email: 
>>> gustaf.rydevik at smi dot ki dot se skype:gustaf_rydevik
>

Re: [R] An "R is slow"-article

2008-01-09 Thread Alberto Monteiro
[article: http://fluff.info/blog/arch/0172.htm ]

Duncan Murdoch wrote:
> 
> If I followed Blair's advice and did everything in C, then 
> development would take much longer, the code would be much buggier 
> (even his example has bugs, and he admits it!!) and all those cases 
> where R is fast enough would just never get done.
> 
I was particularly horrified by this comment:

  The reader well-versed with Apophenia will notice that there is
  a memory leak, because apop_test_fisher_exact returns an apop_data
  struct that never gets freed. But 10,000 lost matrices didn't affect
  the speed of the program at all. The lesson from this is that the
  details of memory management that R is handling for you are not
  such a big deal on a modern PC anyway. 

So, he does a C code that _doesn't_ check if there is memory before
allocating things, then _doesn't_ free memory when it finishes, and
naively says that C is 50 times faster than R?

Alberto Monteiro

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] An "R is slow"-article

2008-01-09 Thread Jeffrey Horner
I hazard to say that the author of that blog post isn't using the time 
he saved from writing his analyses in C very efficiently. I wonder how 
long it took him to write it in C in the first place, even to setup the 
testing of C against R, or to write the blog post.

He didn't say.

Jeff

Armstrong, Whit wrote on 01/09/2008 09:49 AM:
> fisher.test seems to use the .C calling convention in a couple of
> different places.
> 
> for example:
> 
> tmp <- .C("fisher_sim", as.integer(nr), as.integer(nc), 
> as.integer(sr), as.integer(sc), as.integer(n), 
> as.integer(B), integer(nr * nc), double(n + 1), 
> integer(nc), results = double(B), PACKAGE =
> "stats")$results
> 
> 
> perhaps some R experts on the list can tell us whether there is
> significant overhead to .C vs .Call.
> 
> Does .C really duplicate its arguments?  What does RObjToCPtr do?
> 
> 
> (line 1682.. in dotcode.c)
> 
> /* Convert the arguments for use in foreign */
> /* function calls.  Note that we copy twice */
> /* once here, on the way into the call, and */
> /* once below on the way out. */
> cargs = (void**)R_alloc(nargs, sizeof(void*));
> nargs = 0;
> for(pargs = args ; pargs != R_NilValue; pargs = CDR(pargs)) {
> #ifdef THROW_REGISTRATION_TYPE_ERROR
> if(checkTypes &&
>!comparePrimitiveTypes(checkTypes[nargs], CAR(pargs), dup)) {
> /* We can loop over all the arguments and report all the
> 
>erroneous ones, but then we would also want to avoid
> 
>the conversions.  Also, in the future, we may just
> 
>attempt to coerce the value to the appropriate
> 
>type. This is why we pass the checkTypes[nargs] value
> 
>to RObjToCPtr(). We just have to sort out the ability
> 
>to return the correct value which is complicated by
> 
>dup, etc. */
> errorcall(call, _("Wrong type for argument %d in call to
> %s"),
>   nargs+1, symName);
> }
> #endif
> cargs[nargs] = RObjToCPtr(CAR(pargs), naok, dup, nargs + 1,
>   which, symName, argConverters + nargs,
>   checkTypes ? checkTypes[nargs] : 0,
>   encname);
> #ifdef R_MEMORY_PROFILING
> if (TRACE(CAR(pargs)) && dup)
> memtrace_report(CAR(pargs), cargs[nargs]);
> #endif
> nargs++;
>     }
> 
> Thanks,
> Whit
> 
> 
>> -Original Message-
>> From: [EMAIL PROTECTED] 
>> [mailto:[EMAIL PROTECTED] On Behalf Of Gustaf Rydevik
>> Sent: Wednesday, January 09, 2008 10:25 AM
>> To: r-help@r-project.org
>> Subject: [R] An "R is slow"-article
>>
>> Hi all,
>>
>> Reading the wikipedia page on R, I stumbled across the following:
>> http://fluff.info/blog/arch/0172.htm
>>
>> It does seem interesting that the C execution is that much 
>> slower from R than from a native C program. Could any of the 
>> more technically knowledgeable people explain why this is so?
>>
>> The author also have some thought-provoking opinions on R 
>> being no-good and that you should write everything in C 
>> instead (mainly because R is slow and too good at graphics, 
>> encouraging data snooping). See  
>> http://fluff.info/blog/arch/0041.htm
>>  While I don't agree (granted, I can't really write C), it 
>> was interesting to read something from a very different 
>> perspective than I'm used to.
>>
>> Best regards,
>>
>> Gustaf
>>
>> _
>> Department of Epidemiology,
>> Swedish Institute for Infectious Disease Control work email: 
>> gustaf.rydevik at smi dot ki dot se skype:gustaf_rydevik
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> 
> This e-mail message is intended only for the named recipient(s) above. It may 
> contain confidential information. If you are not the intended recipient you 
> are hereby notified that any dissemination, distribution or copying of this 
> e-mail and any attachment(s) is strictly prohibited. If you have received 
> this e-mail in error, please immediately notify the sender by replying to 
> this e-mai

Re: [R] An "R is slow"-article

2008-01-09 Thread Marc Schwartz
Barry Rowlingson wrote:
> Gustaf Rydevik wrote:
>> Hi all,
>>
>> Reading the wikipedia page on R, I stumbled across the following:
>> http://fluff.info/blog/arch/0172.htm
>>
>> It does seem interesting that the C execution is that much slower from
>> R than from a native C program. Could any of the more technically
>> knowledgeable people explain why this is so?
> 
>   I don't think it is. He's comparing some C code with calling 
> fisher.test() from R, which he claims does 'nothing but call C code over 
> and over'. Wrong. It checks its arguments in R, it checks for multiple 
> arguments, it does all sorts of goodness before finally calling 
> .C("fexact"). And then it does even more things. Confidence intervals, 
> odds ratios, p-values and so on.
> 
>   He needs to re-run his tests but instead of calling fisher.test() he 
> should prepare the data and call .C("fexact",...) directly.
> 
>> The author also have some thought-provoking opinions on R being
>> no-good and that you should write everything in C instead (mainly
>> because R is slow and too good at graphics, encouraging data
>> snooping). See  http://fluff.info/blog/arch/0041.htm
> 
>   And of course C is good at buffer overflows and memory leaks and 
> spending ages compiling when you really just want to do fisher.test(foo) 
> and have done with it.
> 
>   He says: "I used to have a simulation written in R calling compiled C 
> that took overnight to process 100 agents, but now that it's all in C 
> simulations with 9,000 agents run in forty minutes. Don't risk it--learn 
> to do statistical computing in C today!". Fine, but I imagine his R code 
> was created much quicker than the C code. R is quicker to write, and 
> once you have established that your code is running too slow for you, 
> then you optimise. By that point you've hopefully debugged your 
> algorithm and spotted all the nasty traps that would have tied you up in 
> the C debugger for a week. You then rewrite in pure C for speed, and you 
> of course have a set of test cases generated from R to verify your C is 
> doing the same as your R. Win win.
> 
>   He claims to be an economist but clearly doesn't recognise the economy 
> of rapid development...
> 
> Barry

If support list activity is any surrogate measure of the success of his
arguments, that there are 7 subscribers and only 2 posts (both by the
same person and without a reply from the application author) on the
Apophenia e-mail lists at:

  https://sourceforge.net/mail/?group_id=130901

one would hypothesize that he has been less than persuasive...

What color is the sky in his world?

;-)

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] An "R is slow"-article

2008-01-09 Thread Paul Gilbert

Gustaf Rydevik wrote:
> Hi all,
> 
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/0172.htm
> 
There are certainly situations where one would want to consider faster 
solutions than interpreted languages but, having been through these 
arguments a few times over the years, here are a few things you might 
consider:

1/ How much is your time worth, how much does the computer time cost, 
and how much does a faster computer cost when you start writing your code?

2/ How much is your time worth, how much does the computer time cost, 
and how much does a faster computer cost when you finish writing your code?

3/ If you tweak the code, or use someone else's private tweaks, how much 
do you trust the results relative to more widely used and tested versions?

4/ You should do speed comparisons with something resembling your real 
problem.

5/ If you want to make R look really bad use a loop that gobbles lots of 
memory, so your machine starts to swap. (This is my guess of part of the 
problem with the "script".)

6/ If you want your code to be really fast, don't do any error checking. 
(This also avoids the enormous amount of time you waste when you find 
errors.)

> It does seem interesting that the C execution is that much slower from
> R than from a native C program. Could any of the more technically
> knowledgeable people explain why this is so?
> 
> The author also have some thought-provoking opinions on R being
> no-good and that you should write everything in C 

People used to say assembler, that's progress.

Paul Gilbert

instead (mainly
> because R is slow and too good at graphics, encouraging data
> snooping). See  http://fluff.info/blog/arch/0041.htm
>  While I don't agree (granted, I can't really write C), it was
> interesting to read something from a very different perspective than
> I'm used to.
> 
> Best regards,
> 
> Gustaf
> 
> _
> Department of Epidemiology,
> Swedish Institute for Infectious Disease Control
> work email: gustaf.rydevik at smi dot ki dot se
> skype:gustaf_rydevik
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] An "R is slow"-article

2008-01-09 Thread Peter Dalgaard
Gustaf Rydevik wrote:
> Hi all,
>
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/0172.htm
>
> It does seem interesting that the C execution is that much slower from
> R than from a native C program. Could any of the more technically
> knowledgeable people explain why this is so?
>
>   
Well, if you are obsessed with speed, R can be the wrong tool. This is
an ingrained aspect of the language itself; if you are interested,
consult some of Luke Tierney's writings about the difficulties of
writing an R compiler. To some extent, it is a tradeoff for flexibility
and expressiveness.

The example is somewhat misleading. The C execution time is probably the
same, but it is drowned out by the administrative overhead of
fisher.test (a 2x2 Fisher test is really not a very complex operation
when cell counts are in the hundreds.)

> The author also have some thought-provoking opinions on R being
> no-good and that you should write everything in C instead (mainly
> because R is slow and too good at graphics, encouraging data
> snooping). See  http://fluff.info/blog/arch/0041.htm
>  While I don't agree (granted, I can't really write C), it was
> interesting to read something from a very different perspective than
> I'm used to.
>   
The idea that you really shouldn't look at data before testing
statistical hypotheses is not without merit, but taken to the extreme,
it tends to become ridiculous. You end up in a situation where you
either can't do anything or you don't know what you are doing. It is
related to the discussions about randomized trials versus observational
studies. The former are in many ways stronger, but sometimes
unavailable, and they tend to be using a very big hammer to whack in a
single nail.


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] An "R is slow"-article

2008-01-09 Thread Robin Hankin
Hello Gustaf, List.

Thanks Gustaf for your post!


well I am working pretty intensively with fisher.test() right now, as
some of you will know.

The comparison is not fair:  R's fisher.test() does a whole
bunch of error checking and testing for the size of the
input matrix and assessing of other arguments, and
puts together a nice little list of class "htest".

The C routine does none of this.


The clincher is that fisher.test() as called gives an estimate
for the odds ratio using uniroot() to numerically solve an
equation in terms of the hypergeometric probability
distribution.  This takes a lo time, but
one doesn't notice it in a standard R session.


Sorry, but the time comparison is simply not worth reporting.







On 9 Jan 2008, at 15:25, Gustaf Rydevik wrote:

> Hi all,
>
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/0172.htm
>
> It does seem interesting that the C execution is that much slower from
> R than from a native C program. Could any of the more technically
> knowledgeable people explain why this is so?
>
> The author also have some thought-provoking opinions on R being
> no-good and that you should write everything in C instead (mainly
> because R is slow and too good at graphics, encouraging data
> snooping). See  http://fluff.info/blog/arch/0041.htm
> While I don't agree (granted, I can't really write C), it was
> interesting to read something from a very different perspective than
> I'm used to.
>
> Best regards,
>
> Gustaf
>
> _
> Department of Epidemiology,
> Swedish Institute for Infectious Disease Control
> work email: gustaf.rydevik at smi dot ki dot se
> skype:gustaf_rydevik
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Robin Hankin
Uncertainty Analyst and Neutral Theorist,
National Oceanography Centre, Southampton
European Way, Southampton SO14 3ZH, UK
  tel  023-8059-7743

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] An "R is slow"-article

2008-01-09 Thread Duncan Murdoch
On 1/9/2008 10:25 AM, Gustaf Rydevik wrote:
> Hi all,
> 
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/0172.htm
> 
> It does seem interesting that the C execution is that much slower from
> R than from a native C program. Could any of the more technically
> knowledgeable people explain why this is so?

That conclusion isn't supported by his test. The main source of the 
difference is interpreting the loop:

test_ct <- 1
x   <- matrix(c(300, 860, 240, 380), nrow=2)
for (i in 1:test_ct)
 {fisher.test(x)}

If he wanted to show that R makes C go slower, he should have put 
together an example that spent most of its time in C, without returning 
to R 1 times.  For example, make the entries in that table 1000 
times larger, and do the test just once:

 > x   <- matrix(c(30, 86, 24, 38), nrow=2)
 > fisher.test(x)

This takes about 20 seconds on my PC, and I'd guess it would take about 
the same amount of time in this author's pure C implementation.

My own experience is that R is about 100 times slower than pure C, and 
usually it doesn't matter.  In cases where it does, I'll  move the 
calculations into C.

If I followed Blair's advice and did everything in C, then development 
would take much longer, the code would be much buggier (even his example 
has bugs, and he admits it!!) and all those cases where R is fast enough 
would just never get done.

Duncan Murdoch

> 
> The author also have some thought-provoking opinions on R being
> no-good and that you should write everything in C instead (mainly
> because R is slow and too good at graphics, encouraging data
> snooping). See  http://fluff.info/blog/arch/0041.htm
>  While I don't agree (granted, I can't really write C), it was
> interesting to read something from a very different perspective than
> I'm used to.
> 
> Best regards,
> 
> Gustaf
> 
> _
> Department of Epidemiology,
> Swedish Institute for Infectious Disease Control
> work email: gustaf.rydevik at smi dot ki dot se
> skype:gustaf_rydevik
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] An "R is slow"-article

2008-01-09 Thread Barry Rowlingson
Gustaf Rydevik wrote:
> Hi all,
> 
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/0172.htm
> 
> It does seem interesting that the C execution is that much slower from
> R than from a native C program. Could any of the more technically
> knowledgeable people explain why this is so?

  I don't think it is. He's comparing some C code with calling 
fisher.test() from R, which he claims does 'nothing but call C code over 
and over'. Wrong. It checks its arguments in R, it checks for multiple 
arguments, it does all sorts of goodness before finally calling 
.C("fexact"). And then it does even more things. Confidence intervals, 
odds ratios, p-values and so on.

  He needs to re-run his tests but instead of calling fisher.test() he 
should prepare the data and call .C("fexact",...) directly.

> The author also have some thought-provoking opinions on R being
> no-good and that you should write everything in C instead (mainly
> because R is slow and too good at graphics, encouraging data
> snooping). See  http://fluff.info/blog/arch/0041.htm

  And of course C is good at buffer overflows and memory leaks and 
spending ages compiling when you really just want to do fisher.test(foo) 
and have done with it.

  He says: "I used to have a simulation written in R calling compiled C 
that took overnight to process 100 agents, but now that it's all in C 
simulations with 9,000 agents run in forty minutes. Don't risk it--learn 
to do statistical computing in C today!". Fine, but I imagine his R code 
was created much quicker than the C code. R is quicker to write, and 
once you have established that your code is running too slow for you, 
then you optimise. By that point you've hopefully debugged your 
algorithm and spotted all the nasty traps that would have tied you up in 
the C debugger for a week. You then rewrite in pure C for speed, and you 
of course have a set of test cases generated from R to verify your C is 
doing the same as your R. Win win.

  He claims to be an economist but clearly doesn't recognise the economy 
of rapid development...

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] An "R is slow"-article

2008-01-09 Thread Armstrong, Whit
fisher.test seems to use the .C calling convention in a couple of
different places.

for example:

tmp <- .C("fisher_sim", as.integer(nr), as.integer(nc), 
as.integer(sr), as.integer(sc), as.integer(n), 
as.integer(B), integer(nr * nc), double(n + 1), 
integer(nc), results = double(B), PACKAGE =
"stats")$results


perhaps some R experts on the list can tell us whether there is
significant overhead to .C vs .Call.

Does .C really duplicate its arguments?  What does RObjToCPtr do?


(line 1682.. in dotcode.c)

/* Convert the arguments for use in foreign */
/* function calls.  Note that we copy twice */
/* once here, on the way into the call, and */
/* once below on the way out. */
cargs = (void**)R_alloc(nargs, sizeof(void*));
nargs = 0;
for(pargs = args ; pargs != R_NilValue; pargs = CDR(pargs)) {
#ifdef THROW_REGISTRATION_TYPE_ERROR
if(checkTypes &&
   !comparePrimitiveTypes(checkTypes[nargs], CAR(pargs), dup)) {
/* We can loop over all the arguments and report all the

   erroneous ones, but then we would also want to avoid

   the conversions.  Also, in the future, we may just

   attempt to coerce the value to the appropriate

   type. This is why we pass the checkTypes[nargs] value

   to RObjToCPtr(). We just have to sort out the ability

   to return the correct value which is complicated by

   dup, etc. */
errorcall(call, _("Wrong type for argument %d in call to
%s"),
  nargs+1, symName);
}
#endif
cargs[nargs] = RObjToCPtr(CAR(pargs), naok, dup, nargs + 1,
  which, symName, argConverters + nargs,
  checkTypes ? checkTypes[nargs] : 0,
  encname);
#ifdef R_MEMORY_PROFILING
if (TRACE(CAR(pargs)) && dup)
memtrace_report(CAR(pargs), cargs[nargs]);
#endif
nargs++;
}

Thanks,
Whit


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Gustaf Rydevik
> Sent: Wednesday, January 09, 2008 10:25 AM
> To: r-help@r-project.org
> Subject: [R] An "R is slow"-article
> 
> Hi all,
> 
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/0172.htm
> 
> It does seem interesting that the C execution is that much 
> slower from R than from a native C program. Could any of the 
> more technically knowledgeable people explain why this is so?
> 
> The author also have some thought-provoking opinions on R 
> being no-good and that you should write everything in C 
> instead (mainly because R is slow and too good at graphics, 
> encouraging data snooping). See  
> http://fluff.info/blog/arch/0041.htm
>  While I don't agree (granted, I can't really write C), it 
> was interesting to read something from a very different 
> perspective than I'm used to.
> 
> Best regards,
> 
> Gustaf
> 
> _
> Department of Epidemiology,
> Swedish Institute for Infectious Disease Control work email: 
> gustaf.rydevik at smi dot ki dot se skype:gustaf_rydevik
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 




This e-mail message is intended only for the named recipient(s) above. It may 
contain confidential information. If you are not the intended recipient you are 
hereby notified that any dissemination, distribution or copying of this e-mail 
and any attachment(s) is strictly prohibited. If you have received this e-mail 
in error, please immediately notify the sender by replying to this e-mail and 
delete the message and any attachment(s) from your system. Thank you.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] An "R is slow"-article

2008-01-09 Thread Gustaf Rydevik
Hi all,

Reading the wikipedia page on R, I stumbled across the following:
http://fluff.info/blog/arch/0172.htm

It does seem interesting that the C execution is that much slower from
R than from a native C program. Could any of the more technically
knowledgeable people explain why this is so?

The author also have some thought-provoking opinions on R being
no-good and that you should write everything in C instead (mainly
because R is slow and too good at graphics, encouraging data
snooping). See  http://fluff.info/blog/arch/0041.htm
 While I don't agree (granted, I can't really write C), it was
interesting to read something from a very different perspective than
I'm used to.

Best regards,

Gustaf

_
Department of Epidemiology,
Swedish Institute for Infectious Disease Control
work email: gustaf.rydevik at smi dot ki dot se
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.