Re: [R] Reasons to Use R (no memory limitations :-))

2007-04-15 Thread charles loboz
This thread discussed R memory limitations, compared handling with S and SAS. 
Since I routinely use R to process multi-gigababyte sets on computers with 
sometimes 256mb of memory - here are some comments on that. 

Most memory limitations vanish if R is used with any relational database. [My 
personal preference is SQLite (RSQLite packaga)  because of speed and no-admin 
(used in embedded mode)]. The comments below apply to any relational database, 
unless otherwise stated.

Most people appear to think about database tables as dataframes - that is to 
store and load the _whole_ dataframe in one go - probably because appropriate 
function names are suggesting this approach. Also, it is a natural mapping. 
This is convenient if the data set can fit fully in memory - but limits the 
size of the data set the same way as without using the database.

However, using SQL language directly one can expand the size of the data set R 
is capable of operating on - we just have to stop treating database tables as 
'atomic'. For example, assume we have a set of several million patients and 
want to analyze some specific subset - the following SQL statement 
  SELECT * FROM patients WHERE gender='M AND AGE BETWEEN 30 AND 35
will result in bringing to R much smaller dataframe than selection of the whole 
table. [Also, such subset selection may take _less_time_ then selecting from 
the total dataframe - assuming the table is properly indexed]. 
Also, direct SQL statements can be used to pre-compute some characteristics 
internally in the database and bring only the summaries to R:
 SELECT AVG(age) FROM patients GROUP BY gender
will bring a data frame of two rows only.

Admittedly, if the data set is really large and we cannot operate on its 
subsets, the above does not help. Though I do not believe that this would the 
the majority of the situations. 

Naturally, going for a 64bit system with enough memory will solve some problems 
without using the database -  but not all of them. Relational databases can be 
very efficient at selecting subsets as they do not have to do linear scans 
[when the tables are indexed] - while R has to do a linear scan every time(??? 
I did not look up the source code of R - please correct me if I am wrong). Two 
other areas where a database is better than R, especially for large data sets:
 - verification of data correctness for individual points [a frequent problem 
with large data sets]
 - combining data from several different types of tables into one dataframe

In summary: using SQL from R allows to process extremely large data sets in a 
limited memory, sometimes even faster then if we had a large memory and kept 
our data set fully in it. Relational database perfectly complements R 
capabilities.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-13 Thread Jim Lemon
(Ted Harding) wrote:
 On 12-Apr-07 10:14:21, Jim Lemon wrote:
 
Charilaos Skiadas wrote:

A new fortune candidate perhaps?

On Apr 10, 2007, at 6:27 PM, Greg Snow wrote:



Remember, everything is better than everything else given the
right comparison.


Only if we remove the grammatical blip that turns it into an infinite 
regress, i.e.

Remember, anything is better than everything else given the right 
comparison

Jim
 
 
 Oh dear, I would be disappointed with that, Jim.
 
 I was rather enjoying the vision of a topological sort tree
 (ordered by better according to some comparison) in which every
 single thing had everything else hanging off it, and in turn was
 hanging off everything else!
 
Sorry, Ted, I think Benoit Mandelbrot beat you to it.

Jim

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-12 Thread Jim Lemon
Charilaos Skiadas wrote:
 A new fortune candidate perhaps?
 
 On Apr 10, 2007, at 6:27 PM, Greg Snow wrote:
 
 
Remember, everything is better than everything else given the right
comparison.

Only if we remove the grammatical blip that turns it into an infinite 
regress, i.e.

Remember, anything is better than everything else given the right 
comparison

Jim

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-12 Thread Joel J. Adamson
Douglas Bates writes:
  One
  can do data analysis by using the computer as a blunt instrument with
  which to bludgeon the problem to death but one can't do elegant data
  analysis like that.

One nice thing about a blunt instrument like Stata is the ability to
hold an entire dataset in memory and interactively play with the model
and generate new variables all in one session.  I figure out what I
want interactively and then separate the data management and analysis in
.do-files, then run them in batch mode.

However, when I first read of the approach of using Perl, sed or awk
to manage data and then only doing the analysis in R, I immediately
thought Wow, that is a really great idea, I never thought of it like
that before.  It would really get me to think about the modelling and
the data management clearly.  A little voice said Dude, you're not
using a PDP-11...(oh wait, that might be kinda cool) but the logic of
it immediately made sense.  I consider it a big part of my
re-Unix-ization.

Joel

-- 
Joel J. Adamson
Biostatistician
Pediatric Psychopharmacology Research Unit
Massachusetts General Hospital
Boston, MA  02114
(617) 643-1432
(303) 880-3109





The information transmitted in this electronic communication...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-12 Thread Lucke, Joseph F
A re-interpretation of Zorn's lemma? 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jim Lemon
Sent: Thursday, April 12, 2007 5:14 AM
To: [EMAIL PROTECTED]
Subject: Re: [R] Reasons to Use R

Charilaos Skiadas wrote:
 A new fortune candidate perhaps?
 
 On Apr 10, 2007, at 6:27 PM, Greg Snow wrote:
 
 
Remember, everything is better than everything else given the right 
comparison.

Only if we remove the grammatical blip that turns it into an infinite
regress, i.e.

Remember, anything is better than everything else given the right
comparison

Jim

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-12 Thread Joel J. Adamson
Lucke, Joseph F writes:
  A re-interpretation of Zorn's lemma? 
  
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Jim Lemon
  Sent: Thursday, April 12, 2007 5:14 AM
  To: [EMAIL PROTECTED]
  Subject: Re: [R] Reasons to Use R
  
  Charilaos Skiadas wrote:
   A new fortune candidate perhaps?
   
   On Apr 10, 2007, at 6:27 PM, Greg Snow wrote:
   
   
  Remember, everything is better than everything else given the right 
  comparison.
  
  Only if we remove the grammatical blip that turns it into an infinite
  regress, i.e.
  
  Remember, anything is better than everything else given the right
  comparison
  
  Jim

Anything is potentially better than any other thing given the right
comparison.

Joel
-- 
Joel J. Adamson
Biostatistician
Pediatric Psychopharmacology Research Unit
Massachusetts General Hospital
Boston, MA  02114
(617) 643-1432
(303) 880-3109





The information transmitted in this electronic communication...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-12 Thread Ted Harding
On 12-Apr-07 10:14:21, Jim Lemon wrote:
 Charilaos Skiadas wrote:
 A new fortune candidate perhaps?
 
 On Apr 10, 2007, at 6:27 PM, Greg Snow wrote:
 
 
Remember, everything is better than everything else given the
right comparison.

 Only if we remove the grammatical blip that turns it into an infinite 
 regress, i.e.
 
 Remember, anything is better than everything else given the right 
 comparison
 
 Jim

Oh dear, I would be disappointed with that, Jim.

I was rather enjoying the vision of a topological sort tree
(ordered by better according to some comparison) in which every
single thing had everything else hanging off it, and in turn was
hanging off everything else!

Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 12-Apr-07   Time: 11:45:05
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Wensui Liu
Greg,
As far as I understand, SAS is more efficient handling large data
probably than S+/R. Do you have any idea why?

On 4/10/07, Greg Snow [EMAIL PROTECTED] wrote:
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of
  Bi-Info (http://members.home.nl/bi-info)
  Sent: Monday, April 09, 2007 4:23 PM
  To: Gabor Grothendieck
  Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
  Subject: Re: [R] Reasons to Use R

 [snip]

  So what's the big deal about S using files instead of memory
  like R. I don't get the point. Isn't there enough swap space
  for S? (Who cares
  anyway: it works, isn't it?) Or are there any problems with S
  and large datasets? I don't get it. You use them, Greg. So
  you might discuss that issue.
 
  Wilfred
 
 

 This is my understanding of the issue (not anything official).

 If you use up all the memory while in R, then the OS will start swapping
 memory to disk, but the OS does not know what parts of memory correspond
 to which objects, so it is entirely possible that the chunk swapped to
 disk contains parts of different data objects, so when you need one of
 those objects again, everything needs to be swapped back in.  This is
 very inefficient.

 S-PLUS occasionally runs into the same problem, but since it does some
 of its own swapping to disk it can be more efficient by swapping single
 data objects (data frames, etc.).  Also, since S-PLUS is already saving
 everything to disk, it does not actually need to do a full swap, it can
 just look and see that a particular data frame has not been used for a
 while, know that it is already saved on the disk, and unload it from
 memory without having to write it to disk first.

 The g.data package for R has some of this functionality of keeping data
 on the disk until needed.

 The better approach for large data sets is to only have some of the data
 in memory at a time and to automatically read just the parts that you
 need.  So for big datasets it is recommended to have the actual data
 stored in a database and use one of the database connection packages to
 only read in the subset that you need.  The SQLiteDF package for R is
 working on automating this process for R.  There are also the bigdata
 module for S-PLUS and the biglm package for R have ways of doing some of
 the common analyses using chunks of data at a time.  This idea is not
 new.  There was a program in the late 1970s and 80s called Rummage by
 Del Scott (I guess technically it still exists, I have a copy on a 5.25
 floppy somewhere) that used the approach of specify the model you wanted
 to fit first, then specify the data file.  Rummage would then figure out
 which sufficient statistics were needed and read the data in chunks,
 compute the sufficient statistics on the fly, and not keep more than a
 couple of lines of the data in memory at once.  Unfortunately it did not
 have much of a user interface, so when memory was cheap and datasets
 only medium sized it did not compete well, I guess it was just a bit too
 ahead of its time.

 Hope this helps,



 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 [EMAIL PROTECTED]
 (801) 408-8111

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Greg Snow
I think SAS has the database part built into it.  I have heard 2nd hand
of new statisticians going to work for a company and asking if they have
SAS, the reply is Yes we use SAS for our database, does it do
statistics also?  Also I heard something about SAS is no longer
considered an acronym, they like having it be just a name and don't want
the fact that one of the S's used to stand for statistics to scare away
companies that use it as a database.

Maybe someone more up on SAS can confirm or deny this.

Also one issue to always look at is central control versus ease of
extendability.  If you have a program that is completely under your
control and does one set of things, then extending it to a new model
(big data) is fairly straight forward.  R is the opposite end of the
spectrum with many contributers and many techniques.  Extending some
basic pieces to be very efficient with big data could be done easily,
but would break many other pieces.  Getting all the different packages
to conform to a single standard in a short amount of time would be near
impossible.

With R's flexibility, there are probably some problems that can be done
quicker with a proper use of biglm than with SAS and I expect that with
some more work and maturity the SQLiteDF package may start to rival SAS
as well on certain problems.  While SAS is a useful program and great at
certain things, there are some tecniques that I would not even attempt
using SAS that are fairly straigh forward in R (I remember seeing some
SAS code to do a bootstrap that included a datastep to read in and
extract information from a SAS output file, SHUDDER  SAS/ODS has
improved this, but I would much rather bootstrap in R/S-PLUS than
anything else).

Remember, everything is better than everything else given the right
comparison.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: Wensui Liu [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, April 10, 2007 3:26 PM
 To: Greg Snow
 Cc: Bi-Info (http://members.home.nl/bi-info); Gabor 
 Grothendieck; Lorenzo Isella; r-help@stat.math.ethz.ch
 Subject: Re: [R] Reasons to Use R
 
 Greg,
 As far as I understand, SAS is more efficient handling large 
 data probably than S+/R. Do you have any idea why?
 
 On 4/10/07, Greg Snow [EMAIL PROTECTED] wrote:
   -Original Message-
   From: [EMAIL PROTECTED] 
   [mailto:[EMAIL PROTECTED] On Behalf Of Bi-Info 
   (http://members.home.nl/bi-info)
   Sent: Monday, April 09, 2007 4:23 PM
   To: Gabor Grothendieck
   Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
   Subject: Re: [R] Reasons to Use R
 
  [snip]
 
   So what's the big deal about S using files instead of 
 memory like R. 
   I don't get the point. Isn't there enough swap space for S? (Who 
   cares
   anyway: it works, isn't it?) Or are there any problems with S and 
   large datasets? I don't get it. You use them, Greg. So you might 
   discuss that issue.
  
   Wilfred
  
  
 
  This is my understanding of the issue (not anything official).
 
  If you use up all the memory while in R, then the OS will start 
  swapping memory to disk, but the OS does not know what 
 parts of memory 
  correspond to which objects, so it is entirely possible 
 that the chunk 
  swapped to disk contains parts of different data objects, 
 so when you 
  need one of those objects again, everything needs to be 
 swapped back 
  in.  This is very inefficient.
 
  S-PLUS occasionally runs into the same problem, but since 
 it does some 
  of its own swapping to disk it can be more efficient by swapping 
  single data objects (data frames, etc.).  Also, since S-PLUS is 
  already saving everything to disk, it does not actually 
 need to do a 
  full swap, it can just look and see that a particular data 
 frame has 
  not been used for a while, know that it is already saved on 
 the disk, 
  and unload it from memory without having to write it to disk first.
 
  The g.data package for R has some of this functionality of keeping 
  data on the disk until needed.
 
  The better approach for large data sets is to only have some of the 
  data in memory at a time and to automatically read just the 
 parts that 
  you need.  So for big datasets it is recommended to have the actual 
  data stored in a database and use one of the database connection 
  packages to only read in the subset that you need.  The SQLiteDF 
  package for R is working on automating this process for R.  
 There are 
  also the bigdata module for S-PLUS and the biglm package for R have 
  ways of doing some of the common analyses using chunks of data at a 
  time.  This idea is not new.  There was a program in the late 1970s 
  and 80s called Rummage by Del Scott (I guess technically it 
 still exists, I have a copy on a 5.25
  floppy somewhere) that used the approach of specify the model you 
  wanted to fit first, then specify the data file.  Rummage 
 would then 
  figure out

Re: [R] Reasons to Use R

2007-04-11 Thread Charilaos Skiadas
A new fortune candidate perhaps?

On Apr 10, 2007, at 6:27 PM, Greg Snow wrote:

 Remember, everything is better than everything else given the right
 comparison.

 -- 
 Gregory (Greg) L. Snow Ph.D.

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Douglas Bates
On 4/10/07, Wensui Liu [EMAIL PROTECTED] wrote:
 Greg,
 As far as I understand, SAS is more efficient handling large data
 probably than S+/R. Do you have any idea why?

SAS originated at a time when large data sets were stored on magnetic
tape and the only reasonable way to process them was sequentially.
Thus most statistics procedures in SAS act as filters, processing one
record at a time and accumulating summary information.  In the past
SAS performed a least squares fit by accumulating the crossproduct of
[X:y] and then using the using the sweep operator to reduce that
matrix. For such an approach the number of observations does not
affect the amount of storage required.  Adding observations just
requires more time.

This works fine (although there are numerical disadvantages to this
approach - try mentioning the sweep operator to an expert in numerical
linear algebra - you get a blank stare) as long as the operations that
you wish to perform fit into this model.  Making the desired
operations fit into the model is the primary reason for the
awkwardness in many SAS analyses.

The emphasis in R is on flexibility and the use of good numerical
techniques - not on processing large data sets sequentially.  The
algorithms used in R for most least squares fits generate and analyze
the complete model matrix instead of summary quantities.  (The
algorithms in the biglm package are a compromise that work on
horizontal sections of the model matrix.)

If your only criterion for comparison is the ability to work with very
large data sets performing operations that can fit into the filter
model used by SAS then SAS will be a better choice.  However you do
lock yourself into a certain set of operations and you are doing it to
save memory, which is a commodity that decreases in price very
rapidly.

As mentioned in other replies, for many years the majority of SAS uses
are for data manipulation rather than for statistical analysis so the
filter model has been modified in later versions.





 On 4/10/07, Greg Snow [EMAIL PROTECTED] wrote:
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf Of
   Bi-Info (http://members.home.nl/bi-info)
   Sent: Monday, April 09, 2007 4:23 PM
   To: Gabor Grothendieck
   Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
   Subject: Re: [R] Reasons to Use R
 
  [snip]
 
   So what's the big deal about S using files instead of memory
   like R. I don't get the point. Isn't there enough swap space
   for S? (Who cares
   anyway: it works, isn't it?) Or are there any problems with S
   and large datasets? I don't get it. You use them, Greg. So
   you might discuss that issue.
  
   Wilfred
  
  
 
  This is my understanding of the issue (not anything official).
 
  If you use up all the memory while in R, then the OS will start swapping
  memory to disk, but the OS does not know what parts of memory correspond
  to which objects, so it is entirely possible that the chunk swapped to
  disk contains parts of different data objects, so when you need one of
  those objects again, everything needs to be swapped back in.  This is
  very inefficient.
 
  S-PLUS occasionally runs into the same problem, but since it does some
  of its own swapping to disk it can be more efficient by swapping single
  data objects (data frames, etc.).  Also, since S-PLUS is already saving
  everything to disk, it does not actually need to do a full swap, it can
  just look and see that a particular data frame has not been used for a
  while, know that it is already saved on the disk, and unload it from
  memory without having to write it to disk first.
 
  The g.data package for R has some of this functionality of keeping data
  on the disk until needed.
 
  The better approach for large data sets is to only have some of the data
  in memory at a time and to automatically read just the parts that you
  need.  So for big datasets it is recommended to have the actual data
  stored in a database and use one of the database connection packages to
  only read in the subset that you need.  The SQLiteDF package for R is
  working on automating this process for R.  There are also the bigdata
  module for S-PLUS and the biglm package for R have ways of doing some of
  the common analyses using chunks of data at a time.  This idea is not
  new.  There was a program in the late 1970s and 80s called Rummage by
  Del Scott (I guess technically it still exists, I have a copy on a 5.25
  floppy somewhere) that used the approach of specify the model you wanted
  to fit first, then specify the data file.  Rummage would then figure out
  which sufficient statistics were needed and read the data in chunks,
  compute the sufficient statistics on the fly, and not keep more than a
  couple of lines of the data in memory at once.  Unfortunately it did not
  have much of a user interface, so when memory was cheap and datasets
  only medium sized it did not compete well, I guess it was just a bit too

Re: [R] Reasons to Use R

2007-04-11 Thread Mike Prager
Certainly true.  In particular, SAS was designed from to store
data items on disk, and to read into core memory the minimum
needed for a particular calculation.

The kind of data SAS handles is (for the most part) limited to
rectangular arrays, similar to R data frames. In many procedures
they can be read from disk sequentially (row by row), which
undoubtedly simplifies memory handling.  It seems logical to
suppose that in developing SAS, algorithms were chosen to
support that style of memory management. Finally, the style of
writing programs in SAS consists of discrete steps of
computation, between which nothing but the program need be held
in core memory.


Gabor Grothendieck [EMAIL PROTECTED] wrote:

 I think SAS was developed at a time when computer memory was
 much smaller than it is now and the legacy of that is its better
 usage of computer resources.
 
 On 4/10/07, Wensui Liu [EMAIL PROTECTED] wrote:
  Greg,
  As far as I understand, SAS is more efficient handling large data
  probably than S+/R. Do you have any idea why?

-- 
Mike Prager, NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Bi-Info (http://members.home.nl/bi-info)
I certainly have that idea too. SPSS functions in a way the same, 
although it specialises in PC applications. Memory addition to a PC is 
not a very expensive thing these days. On my first AT some extra memory 
cost 300 dollars or more. These days you get extra memory with a package 
of marshmellows or chocolate bars if you need it.
All computations on a computer are discrete steps in a way, but I've 
heard that SAS computations are split up in strictly divided steps. That 
also makes procedures attachable I've been told, and interchangable. 
Different procedures can use the same code which alternatively is 
cheaper in memory usages or disk usage (the old days...). That makes SAS 
by the way a complicated machine to build because procedures who are 
split up into numerous fragments which make complicated bookkeeping. If 
you do it that way, I've been told, you can do a lot of computations 
with very little memory. One guy actually computed quite complicated 
models with only 32MB or less, which wasn't very much for his type of 
calculations. Which means that SAS is efficient in memory handling I 
think. It's not very efficient in dollar handling... I estimate.

Wilfred


--




Certainly true.  In particular, SAS was designed from to store
data items on disk, and to read into core memory the minimum
needed for a particular calculation.

The kind of data SAS handles is (for the most part) limited to
rectangular arrays, similar to R data frames. In many procedures
they can be read from disk sequentially (row by row), which
undoubtedly simplifies memory handling.  It seems logical to
suppose that in developing SAS, algorithms were chosen to
support that style of memory management. Finally, the style of
writing programs in SAS consists of discrete steps of
computation, between which nothing but the program need be held
in core memory.


Gabor Grothendieck [EMAIL PROTECTED] wrote:

 I think SAS was developed at a time when computer memory was
 much smaller than it is now and the legacy of that is its better
 usage of computer resources.
 
 On 4/10/07, Wensui Liu [EMAIL PROTECTED] wrote:
  Greg,
  As far as I understand, SAS is more efficient handling large data
  probably than S+/R. Do you have any idea why?

-- 
Mike Prager, NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


-- 
No virus found in this incoming message.


22:44

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Rajarshi Guha
On Wed, 2007-04-11 at 11:06 -0400, Alan Zaslavsky wrote:

 I have thought for a long time that a facility for efficient rowwise 
 calculations might be a valuable enhancement to S/R.  The storage of the 
 object would be handled by a database and there would have to be an 
 efficient interface for pulling a row (or small chunk of rows) out of the 
 database repeatedly; alternatively the operatons could be conducted inside
 the database. 

You can embed R inside postgres, though I don't know how efficient this
would be. But it does allow one to operator on a per row basis.

http://www.omegahat.org/RSPostgres/

---
Rajarshi Guha [EMAIL PROTECTED]
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
---
Finally I am becoming stupider no more
- Paul Erdos' epitaph

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Duncan Temple Lang
Rajarshi Guha wrote:
 On Wed, 2007-04-11 at 11:06 -0400, Alan Zaslavsky wrote:
 
  I have thought for a long time that a facility for efficient rowwise 
  calculations might be a valuable enhancement to S/R.  The storage of the 
  object would be handled by a database and there would have to be an 
  efficient interface for pulling a row (or small chunk of rows) out of the 
  database repeatedly; alternatively the operatons could be conducted inside
  the database. 
 
 You can embed R inside postgres, though I don't know how efficient this
 would be. But it does allow one to operator on a per row basis.
 
 http://www.omegahat.org/RSPostgres/

I still like this idea a lot and a more recent implementation of it was created 
by
Joe Conway  and can be found at

   http://www.joeconway.com/plr/

 D.


 
 ---
 Rajarshi Guha [EMAIL PROTECTED]
 GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
 ---
 Finally I am becoming stupider no more
 - Paul Erdos' epitaph
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Duncan Temple Lang[EMAIL PROTECTED]
Department of Statistics  work:  (530) 752-4782
4210 Mathematical Sciences Bldg.  fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis, CA 95616, USA





pgpinhw3Ik8qk.pgp
Description: PGP signature
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Greg Snow
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Alan Zaslavsky
 Sent: Wednesday, April 11, 2007 9:07 AM
 To: R-help@stat.math.ethz.ch
 Subject: [R] Reasons to Use R

[snip]
 
 I have thought for a long time that a facility for efficient 
 rowwise calculations might be a valuable enhancement to S/R.  
 The storage of the object would be handled by a database and 
 there would have to be an efficient interface for pulling a 
 row (or small chunk of rows) out of the database repeatedly; 
 alternatively the operatons could be conducted inside the 
 database.  Basic operations of rowwise calculation and 
 cumulation (such as forming a column sum or a sum of 
 outer-products) would be written in an R-like syntax and 
 translated into an efficient set of operations that work 
 through the database.  (Would be happy to share some jejeune 
 notes on this.)

The biglm and SQLiteDF packages have made a start in this direction
(unless I am missunderstanding you), adding functionality to either of
those seems the best use of effort.

  However the main answer to thie problem in 
 the R world seems to have been Moore's Law.  Perhaps somebody 
 could tell us more about the S-Plus large objects library, or 
 the work that Doug Bates is doing on efficient calculations 
 with large datasets.

This link gives an overview and some detail of the S-PLUS big data
library
http://www.insightful.com/support/splus70win/eduguide.pdf


   Alan Zaslavsky
   [EMAIL PROTECTED]



-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Marc Schwartz
On Wed, 2007-04-11 at 17:56 +0200, Bi-Info
(http://members.home.nl/bi-info) wrote:
 I certainly have that idea too. SPSS functions in a way the same, 
 although it specialises in PC applications. Memory addition to a PC is 
 not a very expensive thing these days. On my first AT some extra memory 
 cost 300 dollars or more. These days you get extra memory with a package 
 of marshmellows or chocolate bars if you need it.
 All computations on a computer are discrete steps in a way, but I've 
 heard that SAS computations are split up in strictly divided steps. That 
 also makes procedures attachable I've been told, and interchangable. 
 Different procedures can use the same code which alternatively is 
 cheaper in memory usages or disk usage (the old days...). That makes SAS 
 by the way a complicated machine to build because procedures who are 
 split up into numerous fragments which make complicated bookkeeping. If 
 you do it that way, I've been told, you can do a lot of computations 
 with very little memory. One guy actually computed quite complicated 
 models with only 32MB or less, which wasn't very much for his type of 
 calculations. Which means that SAS is efficient in memory handling I 
 think. It's not very efficient in dollar handling... I estimate.
 
 Wilfred

snip

OhSAS is quite efficient in dollar handling, at least when it comes
to the annual commercial licenses...along the same lines as the
purported efficiency of the U.S. income tax system:

  How much money do you have?  Send it in...

There is a reason why SAS is the largest privately held software company
in the world and it is not due to the academic licensing structure,
which constitutes only about 12% of their revenue, based upon their
public figures.

Since SPSS is mentioned, it also functions using similar economic
models...

:-)

Regards,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Alan Zaslavsky
thanks, I will take a look.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R [Broadcast]

2007-04-11 Thread Liaw, Andy
From: Douglas Bates
 
 On 4/10/07, Wensui Liu [EMAIL PROTECTED] wrote:
  Greg,
  As far as I understand, SAS is more efficient handling large data 
  probably than S+/R. Do you have any idea why?
 
 SAS originated at a time when large data sets were stored on 
 magnetic tape and the only reasonable way to process them was 
 sequentially.
 Thus most statistics procedures in SAS act as filters, 
 processing one record at a time and accumulating summary 
 information.  In the past SAS performed a least squares fit 
 by accumulating the crossproduct of [X:y] and then using the 
 using the sweep operator to reduce that matrix. For such an 
 approach the number of observations does not affect the 
 amount of storage required.  Adding observations just 
 requires more time.
 
 This works fine (although there are numerical disadvantages 
 to this approach - try mentioning the sweep operator to an 
 expert in numerical linear algebra - you get a blank stare) 

For those who stared blankly at the above:  The sweep operator is 
just a facier version of the good old Gaussian elimination...

Andy

 as long as the operations that you wish to perform fit into 
 this model.  Making the desired operations fit into the model 
 is the primary reason for the awkwardness in many SAS analyses.
 
 The emphasis in R is on flexibility and the use of good 
 numerical techniques - not on processing large data sets 
 sequentially.  The algorithms used in R for most least 
 squares fits generate and analyze the complete model matrix 
 instead of summary quantities.  (The algorithms in the biglm 
 package are a compromise that work on horizontal sections of 
 the model matrix.)
 
 If your only criterion for comparison is the ability to work 
 with very large data sets performing operations that can fit 
 into the filter model used by SAS then SAS will be a better 
 choice.  However you do lock yourself into a certain set of 
 operations and you are doing it to save memory, which is a 
 commodity that decreases in price very rapidly.
 
 As mentioned in other replies, for many years the majority of 
 SAS uses are for data manipulation rather than for 
 statistical analysis so the filter model has been modified in 
 later versions.
 
 
 
 
 
  On 4/10/07, Greg Snow [EMAIL PROTECTED] wrote:
-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Bi-Info 
(http://members.home.nl/bi-info)
Sent: Monday, April 09, 2007 4:23 PM
To: Gabor Grothendieck
Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
Subject: Re: [R] Reasons to Use R
  
   [snip]
  
So what's the big deal about S using files instead of 
 memory like 
R. I don't get the point. Isn't there enough swap space for S? 
(Who cares
anyway: it works, isn't it?) Or are there any problems 
 with S and 
large datasets? I don't get it. You use them, Greg. So 
 you might 
discuss that issue.
   
Wilfred
   
   
  
   This is my understanding of the issue (not anything official).
  
   If you use up all the memory while in R, then the OS will start 
   swapping memory to disk, but the OS does not know what parts of 
   memory correspond to which objects, so it is entirely 
 possible that 
   the chunk swapped to disk contains parts of different 
 data objects, 
   so when you need one of those objects again, everything 
 needs to be 
   swapped back in.  This is very inefficient.
  
   S-PLUS occasionally runs into the same problem, but since it does 
   some of its own swapping to disk it can be more efficient by 
   swapping single data objects (data frames, etc.).  Also, since 
   S-PLUS is already saving everything to disk, it does not actually 
   need to do a full swap, it can just look and see that a 
 particular 
   data frame has not been used for a while, know that it is already 
   saved on the disk, and unload it from memory without 
 having to write it to disk first.
  
   The g.data package for R has some of this functionality 
 of keeping 
   data on the disk until needed.
  
   The better approach for large data sets is to only have 
 some of the 
   data in memory at a time and to automatically read just the parts 
   that you need.  So for big datasets it is recommended to have the 
   actual data stored in a database and use one of the database 
   connection packages to only read in the subset that you 
 need.  The 
   SQLiteDF package for R is working on automating this 
 process for R.  
   There are also the bigdata module for S-PLUS and the 
 biglm package 
   for R have ways of doing some of the common analyses 
 using chunks of 
   data at a time.  This idea is not new.  There was a 
 program in the 
   late 1970s and 80s called Rummage by Del Scott (I guess 
 technically it still exists, I have a copy on a 5.25
   floppy somewhere) that used the approach of specify the model you 
   wanted to fit first, then specify the data file.  Rummage 
 would then 
   figure out which

Re: [R] Reasons to Use R

2007-04-11 Thread Marc Schwartz
On Wed, 2007-04-11 at 11:26 -0500, Marc Schwartz wrote:
 On Wed, 2007-04-11 at 17:56 +0200, Bi-Info
 (http://members.home.nl/bi-info) wrote:
  I certainly have that idea too. SPSS functions in a way the same, 
  although it specialises in PC applications. Memory addition to a PC is 
  not a very expensive thing these days. On my first AT some extra memory 
  cost 300 dollars or more. These days you get extra memory with a package 
  of marshmellows or chocolate bars if you need it.
  All computations on a computer are discrete steps in a way, but I've 
  heard that SAS computations are split up in strictly divided steps. That 
  also makes procedures attachable I've been told, and interchangable. 
  Different procedures can use the same code which alternatively is 
  cheaper in memory usages or disk usage (the old days...). That makes SAS 
  by the way a complicated machine to build because procedures who are 
  split up into numerous fragments which make complicated bookkeeping. If 
  you do it that way, I've been told, you can do a lot of computations 
  with very little memory. One guy actually computed quite complicated 
  models with only 32MB or less, which wasn't very much for his type of 
  calculations. Which means that SAS is efficient in memory handling I 
  think. It's not very efficient in dollar handling... I estimate.
  
  Wilfred
 
 snip
 
 OhSAS is quite efficient in dollar handling, at least when it comes
 to the annual commercial licenses...along the same lines as the
 purported efficiency of the U.S. income tax system:
 
   How much money do you have?  Send it in...
 
 There is a reason why SAS is the largest privately held software company
 in the world and it is not due to the academic licensing structure,
 which constitutes only about 12% of their revenue, based upon their
 public figures.

Hmmm..here is a classic example of the problems of reading pie
charts. 

The figure I quoted above, which is from reading the 2005 SAS Annual
Report on their web site (such as it is for a private company) comes
from a 3D exploded pie chart (ick...). 

The pie chart uses 3 shades of grey and 5 shades of blue to
differentiate 8 market segments and their percentages of total worldwide
revenue. 

I mis-read the 'shade of grey' allocated to Education as being 12%
(actually 11.7%).

A re-read of the chart, zooming in close on the pie in a PDF reader,
appears to actually show that Education is but 1.8% of their annual
worldwide revenue.

Government based installations, which are presumably the other notable
market segment in which substantially discounted licenses are provided,
is 14.6%.

The report is available here for anyone else curious:

  http://www.sas.com/corporate/report05/annualreport05.pdf

Somebody needs to send SAS a copy of Tufte or Cleveland.

I have to go and rest my eyes now...  ;-)

Regards,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Robert Duval
So I guess my question is...

Is there any hope of R being modified on its core in order to handle
more graciously large datasets? (You've mentioned SAS and SPSS, I'd
add Stata to the list).

Or should we (the users of large datasets) expect to keep on working
with the present tools for the time to come?

robert

On 4/11/07, Marc Schwartz [EMAIL PROTECTED] wrote:
 On Wed, 2007-04-11 at 11:26 -0500, Marc Schwartz wrote:
  On Wed, 2007-04-11 at 17:56 +0200, Bi-Info
  (http://members.home.nl/bi-info) wrote:
   I certainly have that idea too. SPSS functions in a way the same,
   although it specialises in PC applications. Memory addition to a PC is
   not a very expensive thing these days. On my first AT some extra memory
   cost 300 dollars or more. These days you get extra memory with a package
   of marshmellows or chocolate bars if you need it.
   All computations on a computer are discrete steps in a way, but I've
   heard that SAS computations are split up in strictly divided steps. That
   also makes procedures attachable I've been told, and interchangable.
   Different procedures can use the same code which alternatively is
   cheaper in memory usages or disk usage (the old days...). That makes SAS
   by the way a complicated machine to build because procedures who are
   split up into numerous fragments which make complicated bookkeeping. If
   you do it that way, I've been told, you can do a lot of computations
   with very little memory. One guy actually computed quite complicated
   models with only 32MB or less, which wasn't very much for his type of
   calculations. Which means that SAS is efficient in memory handling I
   think. It's not very efficient in dollar handling... I estimate.
  
   Wilfred
 
  snip
 
  OhSAS is quite efficient in dollar handling, at least when it comes
  to the annual commercial licenses...along the same lines as the
  purported efficiency of the U.S. income tax system:
 
How much money do you have?  Send it in...
 
  There is a reason why SAS is the largest privately held software company
  in the world and it is not due to the academic licensing structure,
  which constitutes only about 12% of their revenue, based upon their
  public figures.

 Hmmm..here is a classic example of the problems of reading pie
 charts.

 The figure I quoted above, which is from reading the 2005 SAS Annual
 Report on their web site (such as it is for a private company) comes
 from a 3D exploded pie chart (ick...).

 The pie chart uses 3 shades of grey and 5 shades of blue to
 differentiate 8 market segments and their percentages of total worldwide
 revenue.

 I mis-read the 'shade of grey' allocated to Education as being 12%
 (actually 11.7%).

 A re-read of the chart, zooming in close on the pie in a PDF reader,
 appears to actually show that Education is but 1.8% of their annual
 worldwide revenue.

 Government based installations, which are presumably the other notable
 market segment in which substantially discounted licenses are provided,
 is 14.6%.

 The report is available here for anyone else curious:

   http://www.sas.com/corporate/report05/annualreport05.pdf

 Somebody needs to send SAS a copy of Tufte or Cleveland.

 I have to go and rest my eyes now...  ;-)

 Regards,

 Marc

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Wensui Liu
I think the reason that stata is fast is because it only keeps 1 work
table in ram. if you just keep 1 data frame in R, it will run fast
too. But ...

On 4/11/07, Robert Duval [EMAIL PROTECTED] wrote:
 So I guess my question is...

 Is there any hope of R being modified on its core in order to handle
 more graciously large datasets? (You've mentioned SAS and SPSS, I'd
 add Stata to the list).

 Or should we (the users of large datasets) expect to keep on working
 with the present tools for the time to come?

 robert

 On 4/11/07, Marc Schwartz [EMAIL PROTECTED] wrote:
  On Wed, 2007-04-11 at 11:26 -0500, Marc Schwartz wrote:
   On Wed, 2007-04-11 at 17:56 +0200, Bi-Info
   (http://members.home.nl/bi-info) wrote:
I certainly have that idea too. SPSS functions in a way the same,
although it specialises in PC applications. Memory addition to a PC is
not a very expensive thing these days. On my first AT some extra memory
cost 300 dollars or more. These days you get extra memory with a package
of marshmellows or chocolate bars if you need it.
All computations on a computer are discrete steps in a way, but I've
heard that SAS computations are split up in strictly divided steps. That
also makes procedures attachable I've been told, and interchangable.
Different procedures can use the same code which alternatively is
cheaper in memory usages or disk usage (the old days...). That makes SAS
by the way a complicated machine to build because procedures who are
split up into numerous fragments which make complicated bookkeeping. If
you do it that way, I've been told, you can do a lot of computations
with very little memory. One guy actually computed quite complicated
models with only 32MB or less, which wasn't very much for his type of
calculations. Which means that SAS is efficient in memory handling I
think. It's not very efficient in dollar handling... I estimate.
   
Wilfred
  
   snip
  
   OhSAS is quite efficient in dollar handling, at least when it comes
   to the annual commercial licenses...along the same lines as the
   purported efficiency of the U.S. income tax system:
  
 How much money do you have?  Send it in...
  
   There is a reason why SAS is the largest privately held software company
   in the world and it is not due to the academic licensing structure,
   which constitutes only about 12% of their revenue, based upon their
   public figures.
 
  Hmmm..here is a classic example of the problems of reading pie
  charts.
 
  The figure I quoted above, which is from reading the 2005 SAS Annual
  Report on their web site (such as it is for a private company) comes
  from a 3D exploded pie chart (ick...).
 
  The pie chart uses 3 shades of grey and 5 shades of blue to
  differentiate 8 market segments and their percentages of total worldwide
  revenue.
 
  I mis-read the 'shade of grey' allocated to Education as being 12%
  (actually 11.7%).
 
  A re-read of the chart, zooming in close on the pie in a PDF reader,
  appears to actually show that Education is but 1.8% of their annual
  worldwide revenue.
 
  Government based installations, which are presumably the other notable
  market segment in which substantially discounted licenses are provided,
  is 14.6%.
 
  The report is available here for anyone else curious:
 
http://www.sas.com/corporate/report05/annualreport05.pdf
 
  Somebody needs to send SAS a copy of Tufte or Cleveland.
 
  I have to go and rest my eyes now...  ;-)
 
  Regards,
 
  Marc
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-11 Thread Douglas Bates
On 4/11/07, Robert Duval [EMAIL PROTECTED] wrote:
 So I guess my question is...

 Is there any hope of R being modified on its core in order to handle
 more graciously large datasets? (You've mentioned SAS and SPSS, I'd
 add Stata to the list).

 Or should we (the users of large datasets) expect to keep on working
 with the present tools for the time to come?

We're certainly aware of the desire of many users to be able to handle
large data sets.  I have just spent a couple of days working with a
student from another department who wanted to work with a very large
data set that was poorly structured.  Most of my time was spent trying
to convince her about the limitations in the structure of her data and
what could realistically be expected to be computed with it.

If your purpose is to perform data manipulation and extraction on
large data sets then I think that it is not unreasonable to be
expected to learn to use SQL. I find it convenient to use R to do data
manipulation because I know the language and the support tools well
but I don't expect to do data cleaning on millions of records with it.
 I am probably too conservative in what I will ask R to handle for me
because I started using S on a Vax-11/750 that had 2 megabytes of
memory and it's hard to break old habits.

I think the trend in working with large data sets in R will be toward
a hybrid approach of using a database for data storage and retrieval
plus R for the model definition and computation.  Miguel Manese's
SQLiteDF package and some of the work in Bioconductor are steps in
this direction.

However, as was mentioned earlier in this thread, there is an
underlying assumption with R that the user is thinking about the
analysis as he/she is doing it. We sometimes see questions about I
have a data set with (some large number) of records on several hundred
or thousands of variables and I want to fit a generalized linear
model to it.

I would be hard pressed to think of a situation where I wanted
hundreds of variables in a statistical model unless they are generated
from one or more factors that have many levels.  And, in that case, I
would want to use random effects rather than fixed effects in a model.
 So just saying that the big challenge is to fit some kind of model
with lots of coefficients to a very large number of observations may
be missing the point.  Defining the model better may be the point.

Let me conclude by saying that these are general observations and not
directed to you personally, Robert.  I don't know what you want R to
do graciously to large data sets so my response is more to the general
point that there should always be a balance between thinking about the
structure of the data and the model and brute force computation.  One
can do data analysis by using the computer as a blunt instrument with
which to bludgeon the problem to death but one can't do elegant data
analysis like that.





 robert

 On 4/11/07, Marc Schwartz [EMAIL PROTECTED] wrote:
  On Wed, 2007-04-11 at 11:26 -0500, Marc Schwartz wrote:
   On Wed, 2007-04-11 at 17:56 +0200, Bi-Info
   (http://members.home.nl/bi-info) wrote:
I certainly have that idea too. SPSS functions in a way the same,
although it specialises in PC applications. Memory addition to a PC is
not a very expensive thing these days. On my first AT some extra memory
cost 300 dollars or more. These days you get extra memory with a package
of marshmellows or chocolate bars if you need it.
All computations on a computer are discrete steps in a way, but I've
heard that SAS computations are split up in strictly divided steps. That
also makes procedures attachable I've been told, and interchangable.
Different procedures can use the same code which alternatively is
cheaper in memory usages or disk usage (the old days...). That makes SAS
by the way a complicated machine to build because procedures who are
split up into numerous fragments which make complicated bookkeeping. If
you do it that way, I've been told, you can do a lot of computations
with very little memory. One guy actually computed quite complicated
models with only 32MB or less, which wasn't very much for his type of
calculations. Which means that SAS is efficient in memory handling I
think. It's not very efficient in dollar handling... I estimate.
   
Wilfred
  
   snip
  
   OhSAS is quite efficient in dollar handling, at least when it comes
   to the annual commercial licenses...along the same lines as the
   purported efficiency of the U.S. income tax system:
  
 How much money do you have?  Send it in...
  
   There is a reason why SAS is the largest privately held software company
   in the world and it is not due to the academic licensing structure,
   which constitutes only about 12% of their revenue, based upon their
   public figures.
 
  Hmmm..here is a classic example of the problems of reading pie
  charts.
 
  The figure I quoted above, which 

Re: [R] Reasons to Use R

2007-04-11 Thread Thomas Lumley
On Wed, 11 Apr 2007, Alan Zaslavsky wrote:
 I have thought for a long time that a facility for efficient rowwise
 calculations might be a valuable enhancement to S/R.  The storage of the
 object would be handled by a database and there would have to be an
 efficient interface for pulling a row (or small chunk of rows) out of the
 database repeatedly; alternatively the operatons could be conducted inside
 the database.  Basic operations of rowwise calculation and cumulation
 (such as forming a column sum or a sum of outer-products) would be
 written in an R-like syntax and translated into an efficient set of
 operations that work through the database.  (Would be happy to share
 some jejeune notes on this.)  However the main answer to thie problem
 in the R world seems to have been Moore's Law.  Perhaps somebody could
 tell us more about the S-Plus large objects library, or the work that
 Doug Bates is doing on efficient calculations with large datasets.



I have been surprised to find how much you can get done in SQL, only 
transferring summaries of the data into R.  There is soon going to be an 
experimental surveyNG package that works with survey data stored in a SQLite 
database without transferring the whole thing into R for most operations (and I 
could get further if SQLite had the log() and exp() functions that most other 
SQL implementations for large databases provide). I'll be submitting a paper on 
this to useR2007.

The approach of transferring blocks of data into R and using a database just as 
backing store will allow more general computation but will be less efficient 
than performing the computation in the database, so a mixture of both is likely 
to be helpful.  Moore's Law will settle some issues, but there are problems 
where it is working to increase the size of datasets just as fast as it 
increases computational power.


 -thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Jeffrey J. Hallman
halldor bjornsson [EMAIL PROTECTED] writes:
 ...
 Now, R does not have everything we want. One thing missing is a decent
 R-DB2 connection, for windows the excellent RODBC works fine, but ODBC
 support on Linux is  a hassle. 
 

A hassle?  I use RODBC on Linux to read data from a mainframe DB2 database.  I
had to create the file .odbc.ini in my home directory with lines like this:

[m1db2p]
Driver = DB2
Servername = NameOfOurMainframe
Database   = fdrp
UserName   = NachoBizness
TraceFile  = /home/NachoBizness/.odbc.log

and then to connect I do this:

Sys.putenv(DB2INSTANCE = db2inst)
myConnection - odbcConnect(dsn = m1db2p, uid = uid, pwd = pwd, case = 
toupper)

with 'uid' and 'pwd' set to my mainframe uid and password.

Now, I am not the sysadmin for our Linux machines, but I don't think they had
to do much beyond the standard rpm installation to get this working.  

-- 
Jeff

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Greg Snow
For a previous version of SAS we had parts installed on each computer
where it was used, but there were key pieces located on a network drive
(not internet, but local network) such that if you tried to start SAS
while someone else was using it you would get an error message.

We had troubles with the network, so now we have a full version
installed on each computer, but the person in the company that is the
contact between us and SAS (my group has 1 licence, but the company as a
whole has several) checks up on us from time to time to make sure that
we stick within the 1 at a time guidelines (not hard, we mostly use
other things) or pay for additional licences.

S-PLUS has also had similar types of licences, I was teaching in a
computer lab where all the computers could run S-PLUS, but once 5 people
had started S-PLUS, noone else could until someone else quite out of it
(So we used R for that Class).  For S-PLUS 7 when I upgraded my computer
and installed my licenced copy on the new computer, it disabled the copy
on my old computer.  This may have changed somewhat, because I remember
there being some complaints from people who legitimately installed it on
their laptop, but it would not work when the laptop was not connected to
the internet.

There are a lot of different ways to try to enforce licence conditions
on software (and doing so is important for companies that want to make a
profit these days), unfortunately the current pendulum swing is making
thing more inconvienient for the common user (at home I have some
software that we use to program my wife's sewing machines that can be
installed on any computer, but only works if a hardware key is plugged
into a usb port).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: Charilaos Skiadas [mailto:[EMAIL PROTECTED] 
 Sent: Monday, April 09, 2007 3:24 PM
 To: Greg Snow
 Cc: Gabor Grothendieck; Lorenzo Isella; R-Help list
 Subject: Re: [R] Reasons to Use R
 
 On Apr 9, 2007, at 1:45 PM, Greg Snow wrote:
 
  The licences keep changing, some have in the past but don't 
 now, some 
  you can get an additional licence for home at a discounted 
 price. Some 
  it depends on the type of licence you have at work 
 (currently our SAS 
  licence is such that the 3 people in my group can all have it 
  installed, but at most 1 can be using it at any 1 time, how 
 does that 
  affect installing/using it at home).
 
 Hm, this intrigues me, it would seem to me that the only way 
 for SAS to check that only one of your colleagues uses it at 
 any given time would be to contact some sort of online 
 server. Does that mean that SAS can only be run when you have 
 internet access?
 
 Or is it simply a clause on the license, without any runtime checks?
 
 Haris Skiadas
 Department of Mathematics and Computer Science Hanover College
 
 
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Greg Snow
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Bi-Info (http://members.home.nl/bi-info)
 Sent: Monday, April 09, 2007 4:23 PM
 To: Gabor Grothendieck
 Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
 Subject: Re: [R] Reasons to Use R

[snip] 

 So what's the big deal about S using files instead of memory 
 like R. I don't get the point. Isn't there enough swap space 
 for S? (Who cares
 anyway: it works, isn't it?) Or are there any problems with S 
 and large datasets? I don't get it. You use them, Greg. So 
 you might discuss that issue.
 
 Wilfred
 
 

This is my understanding of the issue (not anything official).

If you use up all the memory while in R, then the OS will start swapping
memory to disk, but the OS does not know what parts of memory correspond
to which objects, so it is entirely possible that the chunk swapped to
disk contains parts of different data objects, so when you need one of
those objects again, everything needs to be swapped back in.  This is
very inefficient.

S-PLUS occasionally runs into the same problem, but since it does some
of its own swapping to disk it can be more efficient by swapping single
data objects (data frames, etc.).  Also, since S-PLUS is already saving
everything to disk, it does not actually need to do a full swap, it can
just look and see that a particular data frame has not been used for a
while, know that it is already saved on the disk, and unload it from
memory without having to write it to disk first.

The g.data package for R has some of this functionality of keeping data
on the disk until needed.

The better approach for large data sets is to only have some of the data
in memory at a time and to automatically read just the parts that you
need.  So for big datasets it is recommended to have the actual data
stored in a database and use one of the database connection packages to
only read in the subset that you need.  The SQLiteDF package for R is
working on automating this process for R.  There are also the bigdata
module for S-PLUS and the biglm package for R have ways of doing some of
the common analyses using chunks of data at a time.  This idea is not
new.  There was a program in the late 1970s and 80s called Rummage by
Del Scott (I guess technically it still exists, I have a copy on a 5.25
floppy somewhere) that used the approach of specify the model you wanted
to fit first, then specify the data file.  Rummage would then figure out
which sufficient statistics were needed and read the data in chunks,
compute the sufficient statistics on the fly, and not keep more than a
couple of lines of the data in memory at once.  Unfortunately it did not
have much of a user interface, so when memory was cheap and datasets
only medium sized it did not compete well, I guess it was just a bit too
ahead of its time.

Hope this helps, 



-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Gabor Grothendieck
I think SAS was developed at a time when computer memory was
much smaller than it is now and the legacy of that is its better
usage of computer resources.

On 4/10/07, Wensui Liu [EMAIL PROTECTED] wrote:
 Greg,
 As far as I understand, SAS is more efficient handling large data
 probably than S+/R. Do you have any idea why?

 On 4/10/07, Greg Snow [EMAIL PROTECTED] wrote:
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf Of
   Bi-Info (http://members.home.nl/bi-info)
   Sent: Monday, April 09, 2007 4:23 PM
   To: Gabor Grothendieck
   Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
   Subject: Re: [R] Reasons to Use R
 
  [snip]
 
   So what's the big deal about S using files instead of memory
   like R. I don't get the point. Isn't there enough swap space
   for S? (Who cares
   anyway: it works, isn't it?) Or are there any problems with S
   and large datasets? I don't get it. You use them, Greg. So
   you might discuss that issue.
  
   Wilfred
  
  
 
  This is my understanding of the issue (not anything official).
 
  If you use up all the memory while in R, then the OS will start swapping
  memory to disk, but the OS does not know what parts of memory correspond
  to which objects, so it is entirely possible that the chunk swapped to
  disk contains parts of different data objects, so when you need one of
  those objects again, everything needs to be swapped back in.  This is
  very inefficient.
 
  S-PLUS occasionally runs into the same problem, but since it does some
  of its own swapping to disk it can be more efficient by swapping single
  data objects (data frames, etc.).  Also, since S-PLUS is already saving
  everything to disk, it does not actually need to do a full swap, it can
  just look and see that a particular data frame has not been used for a
  while, know that it is already saved on the disk, and unload it from
  memory without having to write it to disk first.
 
  The g.data package for R has some of this functionality of keeping data
  on the disk until needed.
 
  The better approach for large data sets is to only have some of the data
  in memory at a time and to automatically read just the parts that you
  need.  So for big datasets it is recommended to have the actual data
  stored in a database and use one of the database connection packages to
  only read in the subset that you need.  The SQLiteDF package for R is
  working on automating this process for R.  There are also the bigdata
  module for S-PLUS and the biglm package for R have ways of doing some of
  the common analyses using chunks of data at a time.  This idea is not
  new.  There was a program in the late 1970s and 80s called Rummage by
  Del Scott (I guess technically it still exists, I have a copy on a 5.25
  floppy somewhere) that used the approach of specify the model you wanted
  to fit first, then specify the data file.  Rummage would then figure out
  which sufficient statistics were needed and read the data in chunks,
  compute the sufficient statistics on the fly, and not keep more than a
  couple of lines of the data in memory at once.  Unfortunately it did not
  have much of a user interface, so when memory was cheap and datasets
  only medium sized it did not compete well, I guess it was just a bit too
  ahead of its time.
 
  Hope this helps,
 
 
 
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  [EMAIL PROTECTED]
  (801) 408-8111
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


 --
 WenSui Liu
 A lousy statistician who happens to know a little programming
 (http://spaces.msn.com/statcompute/blog)


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Taylor, Z Todd
On Monday, April 09, 2007 3:23 PM, someone named Wilfred wrote:

 So what's the big deal about S using files instead of memory
 like R. I don't get the point. Isn't there enough swap space
 for S? (Who cares anyway: it works, isn't it?) Or are there
 any problems with S and large datasets? I don't get it. You
 use them, Greg. So you might discuss that issue.

S's one-to-one correspondence between S objects and filesystem
objects is the single remaining reason I haven't completely
converted over to R.  With S I can manage my objects via
makefiles.  Corrections to raw data or changes to analysis
scripts get applied to all objects in the project (and there
are often thousands of them) by simply typing 'make'.  That
includes everything right down to the graphics that will go
in the report.

How do people live without that?

--Todd
-- 
Why is 'abbreviation' such a long word?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Andrew Robinson
Hi Todd,

I guess I don't see the difference between that strategy and using
make to look after scripts, raw data, Sweave files, and (if necessary)
images.  I find that I can get pretty fine-grained control over what
parts of a project need to be rerun by breaking the analysis into
chapters.  I suppose it depends on whether one takes a script-centric
or an object-centric view of a data analysis project.  A script-centric
view is nicer for version control.  I think that make is
centric-neutral :).

Cheers,

Andrew

On Tue, Apr 10, 2007 at 04:23:54PM -0700, Taylor, Z Todd wrote:
 On Monday, April 09, 2007 3:23 PM, someone named Wilfred wrote:
 
  So what's the big deal about S using files instead of memory
  like R. I don't get the point. Isn't there enough swap space
  for S? (Who cares anyway: it works, isn't it?) Or are there
  any problems with S and large datasets? I don't get it. You
  use them, Greg. So you might discuss that issue.
 
 S's one-to-one correspondence between S objects and filesystem
 objects is the single remaining reason I haven't completely
 converted over to R.  With S I can manage my objects via
 makefiles.  Corrections to raw data or changes to analysis
 scripts get applied to all objects in the project (and there
 are often thousands of them) by simply typing 'make'.  That
 includes everything right down to the graphics that will go
 in the report.
 
 How do people live without that?
 
 --Todd
 -- 
 Why is 'abbreviation' such a long word?
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-10 Thread Frank E Harrell Jr
Taylor, Z Todd wrote:
 On Monday, April 09, 2007 3:23 PM, someone named Wilfred wrote:
 
 So what's the big deal about S using files instead of memory
 like R. I don't get the point. Isn't there enough swap space
 for S? (Who cares anyway: it works, isn't it?) Or are there
 any problems with S and large datasets? I don't get it. You
 use them, Greg. So you might discuss that issue.
 
 S's one-to-one correspondence between S objects and filesystem
 objects is the single remaining reason I haven't completely
 converted over to R.  With S I can manage my objects via
 makefiles.  Corrections to raw data or changes to analysis
 scripts get applied to all objects in the project (and there
 are often thousands of them) by simply typing 'make'.  That
 includes everything right down to the graphics that will go
 in the report.
 
 How do people live without that?

Personally I'd rather have R's save( ) and load( ).

Frank

 
 --Todd


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-09 Thread Gabor Grothendieck
Have you tried 64 bit machines with larger memory or do you mean
that you can't use R on your current machines?

Also have you tried S-Plus?  Will that work for you? The transition from
that to R would be less than from SAS to R.

On 4/9/07, Jorge Cornejo-Donoso [EMAIL PROTECTED] wrote:
 tha s9ze of db is an issue with R. We are still using SAS because R
 can't handle own db, and of couse we don't want to sacrify resolution,
 because the data collection is expensive (at least in fisheries and
 oceagraphy), so.. I think that R need to improve the use of big DBs. Now
 I only can use R for graph preparation and some data analisis, but we
 can't do the main work on R, abd that is really sad.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-09 Thread Jorge Cornejo-Donoso
I have a Dell with 2 Intel XEON 3.0 procesors and 2GB of ram
The problem is the DB size. 

-Mensaje original-
De: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
Enviado el: Lunes, 09 de Abril de 2007 11:28
Para: Jorge Cornejo-Donoso
CC: r-help@stat.math.ethz.ch
Asunto: Re: [R] Reasons to Use R

Have you tried 64 bit machines with larger memory or do you mean that you
can't use R on your current machines?

Also have you tried S-Plus?  Will that work for you? The transition from
that to R would be less than from SAS to R.

On 4/9/07, Jorge Cornejo-Donoso [EMAIL PROTECTED] wrote:
 tha s9ze of db is an issue with R. We are still using SAS because R 
 can't handle own db, and of couse we don't want to sacrify resolution, 
 because the data collection is expensive (at least in fisheries and 
 oceagraphy), so.. I think that R need to improve the use of big DBs. 
 Now I only can use R for graph preparation and some data analisis, but 
 we can't do the main work on R, abd that is really sad.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-09 Thread Gabor Grothendieck
What about the S-Plus question?  S-Plus stores objects in files
whereas R stores them in memory.

On 4/9/07, Jorge Cornejo-Donoso [EMAIL PROTECTED] wrote:
 I have a Dell with 2 Intel XEON 3.0 procesors and 2GB of ram
 The problem is the DB size.

 -Mensaje original-
 De: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
 Enviado el: Lunes, 09 de Abril de 2007 11:28
 Para: Jorge Cornejo-Donoso
 CC: r-help@stat.math.ethz.ch
 Asunto: Re: [R] Reasons to Use R

 Have you tried 64 bit machines with larger memory or do you mean that you
 can't use R on your current machines?

 Also have you tried S-Plus?  Will that work for you? The transition from
 that to R would be less than from SAS to R.

 On 4/9/07, Jorge Cornejo-Donoso [EMAIL PROTECTED] wrote:
  tha s9ze of db is an issue with R. We are still using SAS because R
  can't handle own db, and of couse we don't want to sacrify resolution,
  because the data collection is expensive (at least in fisheries and
  oceagraphy), so.. I think that R need to improve the use of big DBs.
  Now I only can use R for graph preparation and some data analisis, but
  we can't do the main work on R, abd that is really sad.
 



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-09 Thread Greg Snow
Here are a couple more thougts to add to what you have already received:

You mentioned that price is not at issue, but there are other costs than
money that you may want to look at.  On my work machine I have R,
S-PLUS, SAS, SPSS, and a couple of other stats programs; on my laptop
and home computers I have R installed.  So, if a deadline is looming and
I am working on a project mainly in R, it is easy to work on it on the
bus or at home (or in a boring meeting), the same does not work for a
SAS or SPSS project (Hmm, thinking about this now, maybe I need to do
less in R :-).

R and S-PLUS are very flexible/customizable, if you have a certain plot
that you make often you can write your own function/script to do it
automatically, most other programs will give you their standard, then
you have to modify it to meet your specifications.  With sweave (and the
odf and html extensions) you can automate whole reports, very useful for
things that you do month after month.

And what I think is the biggest advantage of R and S-PLUS is that they
strongly encourage you to think about your data.  Other programs (at
least that I am familiar with) tend to have 1 specific way of treating
your data, and expect you to modify your data to fit that programs
model.  These models can be overrestrictive (force you to restructure
your data to fit their model) or underrestrictive (allow things that
should really be separate data objects to be combined into a single
dataset) and sometimes both.  S on the other hand allows many
different ways to store and work with your data, and as you analyze the
data, different branches of new analysis open up depending on early
results rather than just getting stock output for a procedure.  If all
you want is a black box where data goes in one end and a specific answer
comes out the other, then most programs will work; but if you want to
really understand what your data has to tell you, then R/S-PLUS makes
this easy and natural.

Hope this helps,


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Lorenzo Isella
 Sent: Thursday, April 05, 2007 9:02 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Reasons to Use R
 
 Dear All,
 The institute I work for is organizing an internal workshop 
 for High Performance Computing (HPC).
 I am planning to attend it and talk a bit about fluid 
 dynamics, but there is also quite a lot of interest devoted 
 to data post-processing and management of huge data sets.
 A lot of people are interested in image processing/pattern 
 recognition and statistic applied to geography/ecology, but I 
 would like not to post this on too many lists.
 The final aim of the workshop is  understanding hardware 
 requirements and drafting a list of the equipment we would 
 like to buy. I think this could be the venue to talk about R as well.
 Therefore, even if it is not exactly a typical mailing list 
 question, I would like to have suggestions about where to 
 collect info about:
 (1)Institutions (not only academia) using R (2)Hardware 
 requirements, possibly benchmarks (3)R  clusters, R  
 multiple CPU machines, R performance on different hardware.
 (4)finally, a list of the advantages for using R over 
 commercial statistical packages. The money-saving in itself 
 is not a reason good enough and some people are scared by the 
 lack of professional support, though this mailing list is 
 simply wonderful.
 
 Kind Regards
 
 Lorenzo Isella
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-09 Thread Gabor Grothendieck
I might be wrong about this but I thought that the licenses for at least
some of the commercial packages do let you make a copy of the one
you have at work for home use.

On 4/9/07, Greg Snow [EMAIL PROTECTED] wrote:
 Here are a couple more thougts to add to what you have already received:

 You mentioned that price is not at issue, but there are other costs than
 money that you may want to look at.  On my work machine I have R,
 S-PLUS, SAS, SPSS, and a couple of other stats programs; on my laptop
 and home computers I have R installed.  So, if a deadline is looming and
 I am working on a project mainly in R, it is easy to work on it on the
 bus or at home (or in a boring meeting), the same does not work for a
 SAS or SPSS project (Hmm, thinking about this now, maybe I need to do
 less in R :-).

 R and S-PLUS are very flexible/customizable, if you have a certain plot
 that you make often you can write your own function/script to do it
 automatically, most other programs will give you their standard, then
 you have to modify it to meet your specifications.  With sweave (and the
 odf and html extensions) you can automate whole reports, very useful for
 things that you do month after month.

 And what I think is the biggest advantage of R and S-PLUS is that they
 strongly encourage you to think about your data.  Other programs (at
 least that I am familiar with) tend to have 1 specific way of treating
 your data, and expect you to modify your data to fit that programs
 model.  These models can be overrestrictive (force you to restructure
 your data to fit their model) or underrestrictive (allow things that
 should really be separate data objects to be combined into a single
 dataset) and sometimes both.  S on the other hand allows many
 different ways to store and work with your data, and as you analyze the
 data, different branches of new analysis open up depending on early
 results rather than just getting stock output for a procedure.  If all
 you want is a black box where data goes in one end and a specific answer
 comes out the other, then most programs will work; but if you want to
 really understand what your data has to tell you, then R/S-PLUS makes
 this easy and natural.

 Hope this helps,


 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 [EMAIL PROTECTED]
 (801) 408-8111



  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Lorenzo Isella
  Sent: Thursday, April 05, 2007 9:02 AM
  To: r-help@stat.math.ethz.ch
  Subject: [R] Reasons to Use R
 
  Dear All,
  The institute I work for is organizing an internal workshop
  for High Performance Computing (HPC).
  I am planning to attend it and talk a bit about fluid
  dynamics, but there is also quite a lot of interest devoted
  to data post-processing and management of huge data sets.
  A lot of people are interested in image processing/pattern
  recognition and statistic applied to geography/ecology, but I
  would like not to post this on too many lists.
  The final aim of the workshop is  understanding hardware
  requirements and drafting a list of the equipment we would
  like to buy. I think this could be the venue to talk about R as well.
  Therefore, even if it is not exactly a typical mailing list
  question, I would like to have suggestions about where to
  collect info about:
  (1)Institutions (not only academia) using R (2)Hardware
  requirements, possibly benchmarks (3)R  clusters, R 
  multiple CPU machines, R performance on different hardware.
  (4)finally, a list of the advantages for using R over
  commercial statistical packages. The money-saving in itself
  is not a reason good enough and some people are scared by the
  lack of professional support, though this mailing list is
  simply wonderful.
 
  Kind Regards
 
  Lorenzo Isella
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-09 Thread Greg Snow
The licences keep changing, some have in the past but don't now, some
you can get an additional licence for home at a discounted price. Some
it depends on the type of licence you have at work (currently our SAS
licence is such that the 3 people in my group can all have it installed,
but at most 1 can be using it at any 1 time, how does that affect
installing/using it at home).  I may be able to install some of the
software at home also, but for most of them I have given up trying to
figure out the legality of it and so I have not installed them at home
to be on the safe side.

Some of the doctors I work with who are also affiliated with the local
university have mentioned that they can get a discounted academic
version of SAS and could use that, but my interpretation of the academic
licence that one showed me (probably not the most recent) said (in my
interpretation, I am not a lawyer) that if they published the results
without paying a licence upgrade fee, they would be violating the
licence (the academic version was intended for teaching only).

The R licence on the other hand is pretty clear that I can install it
and use it pretty much anywhere I want.

You are right in correcting me, R is not the only package that can be
used on multiple computers.  I do think it is the most straight forward
of the good ones.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
 Sent: Monday, April 09, 2007 10:44 AM
 To: Greg Snow
 Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
 Subject: Re: [R] Reasons to Use R
 
 I might be wrong about this but I thought that the licenses 
 for at least some of the commercial packages do let you make 
 a copy of the one you have at work for home use.
 
 On 4/9/07, Greg Snow [EMAIL PROTECTED] wrote:
  Here are a couple more thougts to add to what you have 
 already received:
 
  You mentioned that price is not at issue, but there are other costs 
  than money that you may want to look at.  On my work 
 machine I have R, 
  S-PLUS, SAS, SPSS, and a couple of other stats programs; on 
 my laptop 
  and home computers I have R installed.  So, if a deadline 
 is looming 
  and I am working on a project mainly in R, it is easy to 
 work on it on 
  the bus or at home (or in a boring meeting), the same does not work 
  for a SAS or SPSS project (Hmm, thinking about this now, 
 maybe I need 
  to do less in R :-).
 
  R and S-PLUS are very flexible/customizable, if you have a certain 
  plot that you make often you can write your own 
 function/script to do 
  it automatically, most other programs will give you their standard, 
  then you have to modify it to meet your specifications.  
 With sweave 
  (and the odf and html extensions) you can automate whole 
 reports, very 
  useful for things that you do month after month.
 
  And what I think is the biggest advantage of R and S-PLUS 
 is that they 
  strongly encourage you to think about your data.  Other 
 programs (at 
  least that I am familiar with) tend to have 1 specific way 
 of treating 
  your data, and expect you to modify your data to fit that programs 
  model.  These models can be overrestrictive (force you to 
 restructure 
  your data to fit their model) or underrestrictive (allow 
 things that 
  should really be separate data objects to be combined into a single
  dataset) and sometimes both.  S on the other hand allows many 
  different ways to store and work with your data, and as you analyze 
  the data, different branches of new analysis open up depending on 
  early results rather than just getting stock output for a 
 procedure.  
  If all you want is a black box where data goes in one end and a 
  specific answer comes out the other, then most programs 
 will work; but 
  if you want to really understand what your data has to tell 
 you, then 
  R/S-PLUS makes this easy and natural.
 
  Hope this helps,
 
 
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  [EMAIL PROTECTED]
  (801) 408-8111
 
 
 
   -Original Message-
   From: [EMAIL PROTECTED] 
   [mailto:[EMAIL PROTECTED] On Behalf Of Lorenzo 
   Isella
   Sent: Thursday, April 05, 2007 9:02 AM
   To: r-help@stat.math.ethz.ch
   Subject: [R] Reasons to Use R
  
   Dear All,
   The institute I work for is organizing an internal 
 workshop for High 
   Performance Computing (HPC).
   I am planning to attend it and talk a bit about fluid 
 dynamics, but 
   there is also quite a lot of interest devoted to data 
   post-processing and management of huge data sets.
   A lot of people are interested in image processing/pattern 
   recognition and statistic applied to geography/ecology, 
 but I would 
   like not to post this on too many lists.
   The final aim of the workshop is  understanding hardware 
   requirements and drafting a list of the equipment we 
 would like

Re: [R] Reasons to Use R

2007-04-09 Thread halldor bjornsson
Dear Lorenzo,

Thanks for starting a great thread here. Like others, I would like to
hear a summary
if you make one.

My institute uses R for internal data processing and analyzing. Below
are some of our reasons, and yes cost (or lack thereof) is not the
only one.

First, prior to the rise of R we already had a number of people using
Splus, and our
main compute server had licenses for Splus. As the institution moved
from Sun Unix
servers to Linux workstations and servers, the licensing issue became
important. Having
to service many licenses (one per workstation, and several on the
servers) is time consuming for overworked IT staff. Furthermore, our
Splus programs that ran routinely on the servers
could all be easily made run on R. Hence, this was really a no-brainer.

Second, R runs on both windows and linux (and solaris and macs,-
although the last one is not really an issue for us). We have made
some user programs that are tailor-made for the work we do, these we
bundle into R packages, that then can be used on both windows and
linux. This was a very important consideration for us.

Third, user community. Even with commercial solutions (such as Matlab)
the quality of the
user community is very important, - if we had felt that R did not have
an active and responsive community we probably would have been more
hesitant. Needless to say
R has an incredibly active community which makes it an attractive environment.
Furthermore, other institutions in our field are also adopting R, at
least in the research departments.

Fourth, R is a good choice for many of the things that we do (data
analysis of varying complexity, good graphics, maptools [working with
shapefiles] etc). It was therefore an obvious candiate for us from the
start.

Now, R does not have everything we want. One thing missing is a decent
R-DB2 connection, for windows the excellent RODBC works fine, but ODBC
support on Linux is  a hassle. The big file issue is there, but many
of our files are GRIB which is a format that is  generally not
supported by anyone Furthermore, object graphics, ala pythons
matplotlib (and of course  Matlab) is not there, but would be very
handy. However, that being said, it is easy to make publication (print
and web) quality graphics with R. And of course as always with Open
Source if you miss something bad enough why not do it (or have it
done) yourself and add it to the package.

We have not used R much for large NetCDF datasets, there are other
tools (such as
the CDO package, which also supports GRIB) that are better oriented for this.

We have used R on solaris, Linux (several different flavours) and
Windows (since W98).  We currently use it on our primary production
servers (RedHat Enterprise Edition), but we have not used it in a
parallel setting. We have not used R for making on-the-fly
calculations and graphics for the web, although this is clearly
possible.

I hope this helps, I have found  this thread to be a good one.

Sincerely,
Halldór

On 4/5/07, Lorenzo Isella [EMAIL PROTECTED] wrote:
 Dear All,
 The institute I work for is organizing an internal workshop for High
 Performance Computing (HPC).
 I am planning to attend it and talk a bit about fluid dynamics, but
 there is also quite a lot of interest devoted to data post-processing
 and management of huge data sets.
 A lot of people are interested in image processing/pattern recognition
 and statistic applied to geography/ecology, but I would like not to
 post this on too many lists.
 The final aim of the workshop is  understanding hardware requirements
 and drafting a list of the equipment we would like to buy. I think
 this could be the venue to talk about R as well.
 Therefore, even if it is not exactly a typical mailing list question,
 I would like to have suggestions about where to collect info about:
 (1)Institutions (not only academia) using R
 (2)Hardware requirements, possibly benchmarks
 (3)R  clusters, R  multiple CPU machines, R performance on different 
 hardware.
 (4)finally, a list of the advantages for using R over commercial
 statistical packages. The money-saving in itself is not a reason good
 enough and some people are scared by the lack of professional support,
 though this mailing list is simply wonderful.

 Kind Regards

 Lorenzo Isella

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Halldór Björnsson
Deildarstj. Ranns.  Þróun
Veðursvið Veðurstofu Íslands

Halldór Bjornsson
Weatherservice R  D
Icelandic Met. Office

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-09 Thread Charilaos Skiadas
On Apr 9, 2007, at 1:45 PM, Greg Snow wrote:

 The licences keep changing, some have in the past but don't now, some
 you can get an additional licence for home at a discounted price. Some
 it depends on the type of licence you have at work (currently our SAS
 licence is such that the 3 people in my group can all have it  
 installed,
 but at most 1 can be using it at any 1 time, how does that affect
 installing/using it at home).

Hm, this intrigues me, it would seem to me that the only way for SAS  
to check that only one of your colleagues uses it at any given time  
would be to contact some sort of online server. Does that mean that  
SAS can only be run when you have internet access?

Or is it simply a clause on the license, without any runtime checks?

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-09 Thread Bi-Info (http://members.home.nl/bi-info)
Licensing is a big issue in software. The way I prefer it is an easy 
license, a license which makes it possible that I can work on another 
PC, without paying a lot of money. R produces quite good results and is 
widely used. That makes it a statistical package that I want.
The other thing is that working with large datasets requires some 
effort by software makers to get it working. I doubt if R has the 
capability of working consistently with large datasets. That is an issue 
I think. I have done some comparisons between SPSS and R, and R seems to 
be performing allright, so I can do computations with it. Nonetheless: 
the data handling is not quite as good I think in comparison with SAS.

When I started doing statistics there were about three packages: SPSS, 
SAS and BMDP (at least: these were available). On a PC you were required 
to use SPSS.
Nowadays there are hundreds, some with excellent database facilities, or 
you can compute the newest statistical tests, or an exotic one. I 
haven't got a clue how to work with new database facilities. dBase was 
my only database education and everything has changed. So I cannot 
answer if R is capable of working with large datasets in relation to 
databases. I really don't know. The only thing I know that if I compute 
a ChiSq, it works on a relatively large dataset (not Fisher tests by the 
way). The same with a likelihood procedure, or tabulations including 
non-parametrics or factor analysis.   But databases are an issue I've 
been told by a guy who works with R. SAS was a better option he told me.

So what's the big deal about S using files instead of memory like R. I 
don't get the point. Isn't there enough swap space for S? (Who cares 
anyway: it works, isn't it?) Or are there any problems with S and large 
datasets? I don't get it. You use them, Greg. So you might discuss that 
issue.

Wilfred










The licences keep changing, some have in the past but don't now, some
you can get an additional licence for home at a discounted price. Some
it depends on the type of licence you have at work (currently our SAS
licence is such that the 3 people in my group can all have it installed,
but at most 1 can be using it at any 1 time, how does that affect
installing/using it at home).  I may be able to install some of the
software at home also, but for most of them I have given up trying to
figure out the legality of it and so I have not installed them at home
to be on the safe side.

Some of the doctors I work with who are also affiliated with the local
university have mentioned that they can get a discounted academic
version of SAS and could use that, but my interpretation of the academic
licence that one showed me (probably not the most recent) said (in my
interpretation, I am not a lawyer) that if they published the results
without paying a licence upgrade fee, they would be violating the
licence (the academic version was intended for teaching only).

The R licence on the other hand is pretty clear that I can install it
and use it pretty much anywhere I want.

You are right in correcting me, R is not the only package that can be
used on multiple computers.  I do think it is the most straight forward
of the good ones.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111



 -Original Message-
 From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
 Sent: Monday, April 09, 2007 10:44 AM
 To: Greg Snow
 Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
 Subject: Re: [R] Reasons to Use R
 
 I might be wrong about this but I thought that the licenses 
 for at least some of the commercial packages do let you make 
 a copy of the one you have at work for home use.
 
 On 4/9/07, Greg Snow [EMAIL PROTECTED] wrote:
  Here are a couple more thougts to add to what you have 
 already received:
 
  You mentioned that price is not at issue, but there are other costs 
  than money that you may want to look at.  On my work 
 machine I have R, 
  S-PLUS, SAS, SPSS, and a couple of other stats programs; on 
 my laptop 
  and home computers I have R installed.  So, if a deadline 
 is looming 
  and I am working on a project mainly in R, it is easy to 
 work on it on 
  the bus or at home (or in a boring meeting), the same does not work 
  for a SAS or SPSS project (Hmm, thinking about this now, 
 maybe I need 
  to do less in R :-).
 
  R and S-PLUS are very flexible/customizable, if you have a certain 
  plot that you make often you can write your own 
 function/script to do 
  it automatically, most other programs will give you their standard, 
  then you have to modify it to meet your specifications.  
 With sweave 
  (and the odf and html extensions) you can automate whole 
 reports, very 
  useful for things that you do month after month.
 
  And what I think is the biggest advantage of R and S-PLUS 
 is that they 
  strongly encourage you to think about your data.  Other 
 programs (at 
  least that I am

Re: [R] Reasons to Use R [Broadcast]

2007-04-09 Thread Liaw, Andy
I've probably been away from SAS for too long... we've recently tried to
get SAS on our 64-bit Linux boxes (because SAS on PC is not sufficient
for some of my colleagues who need it).  I was shocked by the quote for
our 28-core Scyld cluster--- the annual fee was a few times the total
cost of our hardware.  We ended up buying a new quad 3GHz Opterons box
with 32GB ram just so that the fee for SAS on such a box would be more
tolerable.  It just boggles my mind that the right to use SAS for a year
is about the price of a nice four-bedroom house (near SAS Institute!).
I don't understand people who rather pay that kind of price for the
software, instead of spending the money on state-of-the-art hardware and
save more than a bundle.

Just my $0.02...
Andy

From: Jorge Cornejo-Donoso
 
 I have a Dell with 2 Intel XEON 3.0 procesors and 2GB of ram
 The problem is the DB size. 
 
 -Mensaje original-
 De: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
 Enviado el: Lunes, 09 de Abril de 2007 11:28
 Para: Jorge Cornejo-Donoso
 CC: r-help@stat.math.ethz.ch
 Asunto: Re: [R] Reasons to Use R
 
 Have you tried 64 bit machines with larger memory or do you 
 mean that you can't use R on your current machines?
 
 Also have you tried S-Plus?  Will that work for you? The 
 transition from that to R would be less than from SAS to R.
 
 On 4/9/07, Jorge Cornejo-Donoso [EMAIL PROTECTED] wrote:
  tha s9ze of db is an issue with R. We are still using SAS because R 
  can't handle own db, and of couse we don't want to sacrify 
 resolution, 
  because the data collection is expensive (at least in fisheries and 
  oceagraphy), so.. I think that R need to improve the use of big DBs.
  Now I only can use R for graph preparation and some data 
 analisis, but 
  we can't do the main work on R, abd that is really sad.
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R [Broadcast]

2007-04-09 Thread Wensui Liu
Andy,
I totally agree with you. Money should be spent on the people working
hard instead of on the fancy software. But in real life, it is the
opposite. ^_^.

On 4/9/07, Liaw, Andy [EMAIL PROTECTED] wrote:
 I've probably been away from SAS for too long... we've recently tried to
 get SAS on our 64-bit Linux boxes (because SAS on PC is not sufficient
 for some of my colleagues who need it).  I was shocked by the quote for
 our 28-core Scyld cluster--- the annual fee was a few times the total
 cost of our hardware.  We ended up buying a new quad 3GHz Opterons box
 with 32GB ram just so that the fee for SAS on such a box would be more
 tolerable.  It just boggles my mind that the right to use SAS for a year
 is about the price of a nice four-bedroom house (near SAS Institute!).
 I don't understand people who rather pay that kind of price for the
 software, instead of spending the money on state-of-the-art hardware and
 save more than a bundle.

 Just my $0.02...
 Andy

 From: Jorge Cornejo-Donoso
 
  I have a Dell with 2 Intel XEON 3.0 procesors and 2GB of ram
  The problem is the DB size.
 
  -Mensaje original-
  De: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
  Enviado el: Lunes, 09 de Abril de 2007 11:28
  Para: Jorge Cornejo-Donoso
  CC: r-help@stat.math.ethz.ch
  Asunto: Re: [R] Reasons to Use R
 
  Have you tried 64 bit machines with larger memory or do you
  mean that you can't use R on your current machines?
 
  Also have you tried S-Plus?  Will that work for you? The
  transition from that to R would be less than from SAS to R.
 
  On 4/9/07, Jorge Cornejo-Donoso [EMAIL PROTECTED] wrote:
   tha s9ze of db is an issue with R. We are still using SAS because R
   can't handle own db, and of couse we don't want to sacrify
  resolution,
   because the data collection is expensive (at least in fisheries and
   oceagraphy), so.. I think that R need to improve the use of big DBs.
   Now I only can use R for graph preparation and some data
  analisis, but
   we can't do the main work on R, abd that is really sad.
  
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 


 --
 Notice:  This e-mail message, together with any attachments,...{{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-08 Thread Johann Hibschman
On 4/6/07, Wilfred Zegwaard [EMAIL PROTECTED] wrote:

 I'm not a programmer, but I have the experience that R is good for
 processing large datasets, especially in combination with specialised
 statistics.

This I find a little surprising, but maybe it's just a sign that I'm
not experienced enough with R yet.

I can't use R for big datasets.  At all.  Big datasets take forever to
load with read.table, R frequently runs out of memory,  and nlm or
gnlm never seem to actually converge to answers.  By comparison, I can
point SAS and NLIN at this data without problem.  (Of course, SAS is
running on a pretty powerful dedicated machine with a big ram disk, so
that may be part of the problem.)

R's pass-by-value semantics also make it harder than it should be to
deal with where it's crucial that you not make a copy of the data
frame, for fear of running out of memory.  Pass-by-reference would
make implementing data transformations so much easier that I don't
really understand how pass-by-value became the standard.  (If there's
a trick to doing in-place transformations, I've not found it.)

Right now, I'm considering starting on a project involving some big
Monte Carlo integrations over the complicated posterior parameter
distributions of a nonlinear regression model, and I have the strong
feeling that R will just choke.

R's great for small projects, but as soon as you even a few hundred
megs of data, it seems to break down.

If I'm doing things wrong, please tell me.  :-)  SAS is a beast to work with.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-08 Thread Gabor Grothendieck
On 4/8/07, Johann Hibschman [EMAIL PROTECTED] wrote:
 R's pass-by-value semantics also make it harder than it should be to
 deal with where it's crucial that you not make a copy of the data
 frame, for fear of running out of memory.  Pass-by-reference would
 make implementing data transformations so much easier that I don't
 really understand how pass-by-value became the standard.  (If there's
 a trick to doing in-place transformations, I've not found it.)

Because R processes objects in memory I also would not rate it as
as strong as some other packages on very large data sets but you can
use databases which may make it less important in some cases and you
can get a certain amount of mileage out of R environments and as
64 bit computers become commonplace and memory sizes grow
larger and larger data sets will become easy to handle.

Regarding environments, also available are proto objects from the
proto package which are environments with slightly different semantics.
Even if you don't intend to use the proto package its got quite a bit
of documentation and supporting information that might be
helpful:

- home page:
  http://code.google.com/p/r-proto/
- overview (click on Wiki tab at home page) which includes article links
  that discuss OO and environments
- tutorial, reference card, reference manual, vignette (see Links box)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-08 Thread Wilfred Zegwaard
Dear Johann and Gabor,

It's what amounts to large datasets. There are hundreds of datasets R
can't handle, probably thousands or more. I noticed on my computer
(which is nothing more that an average PC) that R breaks down after 250
MB of memory. I also note that SPSS breaks down, Matlab, etc.

I'm not a SAS user, but I have worked in the past with SAS. It's very
good as a remember, but it's ten years ago. And it's a dollar machine
I've been told: you add dollars to SAS as you add dollars to a Porsche.
I haven't got it and for most statistical applications it isn't
necessary I've been told. R is sufficient for that. The datasets I use
are often not that big (the way I like it).
About three years ago I spoke to somebody who has worked with it and
said it's database system is excellent and statistical profound.
Someone with a PhD, so probably he is right.

Monte-Carlo simulations are computationally time-consuming, but probably
these can be done in R. I haven't seen any libaries for it (they might
be there). It has been done with S (the commercial counterpart of R), so
probably with R too. If you tie Monte Carlo simulaton with large
datasets you probably run into problems with a conventional R system.
What I've been told in those instances is buy a new computer / add
memory and buy a new processor... and don't smoke hashiesh.

That wasn't a good advice because the guy who told me smoked hashiesh
like hell and drank Pastis (blue liqor) like water. I kicked him out.
But that's another story.

Cheers,

Wilfred

(I drink wine and tailor made beer, and only on occasions. That's why.
His simulations were good I've been told.)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-06 Thread Lorenzo Isella
John Kane wrote:
 --- Lorenzo Isella [EMAIL PROTECTED] wrote:

   
 (4)finally, a list of the advantages for using R
 over commercial
 statistical packages. The money-saving in itself is
 not a reason good
 enough and some people are scared by the lack of
 professional support,
 though this mailing list is simply wonderful.

 
 Given that I can do as much if not more with R (in
 most cases) than with commercial software, as an
 independent consultant,  'cost' is a very significant
 factor. 

 A very major advantage of R is the money-saving.  Have
 a look at
 http://www.spss.com/stores/1/Software_Full_Version_C2.cfm

  and convince me that cost ( for an independent
 contractor) is not a good reason. 

 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around 
 http://mail.yahoo.com 

   
Hello,
No doubt that for an independent contractor money is a significant 
issue, but we are talking about the case of a large organization for 
which spending a few thousand euros on software is routine.
To avoid misunderstandings: I am myself an R user and I have no 
intention to pay a cent for statistical software, but in order to speak 
up for R vs any commercial software for data analysis and 
postprocessing, I need technical details (benchmarks, etc...) rather 
than the fact that it helps saving money.
Kind Regards

Lorenzo

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-06 Thread Wilfred Zegwaard
As to my knowledge the core of R is considered adequate and good by
the statisticians. That's sufficient isn't it?
Last year I read some documentation about R and most routines were
considered good, but some very bad. That is a benchmark somehow.

There must be some benchmarks you want. R is widely used and there must
be people around who can provide you with the adequate stuff. CRAN is a
way to that, or the project page.

The core is free by the way and you can participate in the development.
People can provide you there with the information you want. R is quite
well documented (not everybody thinks it's well doc'ed, but... you
know... opinions do vary).

There is one simple reason to use R. It's free that's for one. If you
have the money commercial software is sufficient. That doesn't mean that
R is the poor mans software. It works quite well actually (but you...
know... opinions vary, especially about statistical software). I think
that's the usual reason to use it: it works quite well, and it's
documentation is widely available. A LOT of statistical procedures are
available. R crashed about 2 times last year on my computer and that's a
better than SPSS, and there are a lot of user interfaces available which
make working with R easier.
Personally I don't like SPSS, but I do know that the R core is used in
commercial applications. So at least one person has done some benchmarks.

Wilfred

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-06 Thread Stephen Tucker
Hi Lorenzo,

I don't think I'm qualified to provide solid information on the first
three questions, but I'd like to drop a few thoughts on (4). While
there are no shortage of language advocates out there, I'd like to
join in for this once. My background is in chemical engineering and
atmospheric science; I've done simulation on a smaller scale but spend
much of my time analyzing large sets of experimental data. I am
comfortable programming in Matlab, R, Python, C, Fortran, Igor Pro,
and I also know a little IDL but have not programmed in it
extensively.

As you are probably aware, I would count among these, Matlab, R,
Python, and IDL as good candidates for processing large data sets, as
they are high-level languages and can communicate with netCDF files
(which I imagine will be used to transfer data).

Each language boasts an impressive array of libraries, but what I
think gives R the advantage for analyzing data is the level of
abstraction in the language. I am extremely impressed with the objects
available to represent data sets, and the functions support them very
well - it requires that I carry around a fewer number of objects to
hold information about my data (and I don't have to unpack them to
feed them into functions). The language is also very expressive in
that it lets you write a procedure in many different ways, some
shorter, some more readable, depending on what your situation
requires. System commands and text processing are integrated into the
language, and the input/output facilities are excellent, in terms of
data and graphics. Once I have my data object I am only a few
keystrokes to split, sort, and visualize multivariate data; even after
several years I keep discovering new functions for basic things like
manipulation of data objects and descriptive statistics, and plotting
- truly, an analyst's needs have been well anticipated.

And this is a recent obsession of mine, which I was introduced to
through Python, but the functional programming support for R is
amazing. By using higher-order functions like lapply(), I infrequently
rely on FOR-LOOPS, which have often caused me trouble in the past
because I had forgotten to re-initialize a variable, or incremented
the wrong variable, etc. Though I'm definitely not militant about
functional programming, in general I try to write functions and then
apply them to the data (if the functions don't exist in R already),
often through higher-order functions such as lapply(). This approach
keeps most variables out of the global namespace and so I am less
likely to reassign a value to a variable that I had intended to
keep. It also makes my code more modular so that I can re-use bits of
my code as my analysis inevitably grows much larger than I had
originally intended.

Furthermore, my code in R ends up being much, much shorter than code I
imagine writing in other languages to accomplish the same task; I
believe this leads to fewer places for errors to occur, and the nature
of the code is immediately comprehensible (though a series of nested
functions can get pretty hard to read at times), not to mention it
takes less effort to write. This also makes it easier to interact with
the data, I think, because after making a plot I can set up for the
next plot with only a few function calls instead of setting out to
write a block of code with loops, etc.

I have actually recommended R to colleagues who needed to analyze the
information from large-scale air quality/ global climate simulations,
and they are extremely pleased. I think the capability for statistics
and graphics is well-established enough that I don't need to do a
hard-sell on that so much, but R's language is something I get very
excited about. I do appreciate all the contributors who have made this
available.

Best regards,
ST


--- Lorenzo Isella [EMAIL PROTECTED] wrote:

 Dear All,
 The institute I work for is organizing an internal workshop for High
 Performance Computing (HPC).
 I am planning to attend it and talk a bit about fluid dynamics, but
 there is also quite a lot of interest devoted to data post-processing
 and management of huge data sets.
 A lot of people are interested in image processing/pattern recognition
 and statistic applied to geography/ecology, but I would like not to
 post this on too many lists.
 The final aim of the workshop is  understanding hardware requirements
 and drafting a list of the equipment we would like to buy. I think
 this could be the venue to talk about R as well.
 Therefore, even if it is not exactly a typical mailing list question,
 I would like to have suggestions about where to collect info about:
 (1)Institutions (not only academia) using R
 (2)Hardware requirements, possibly benchmarks
 (3)R  clusters, R  multiple CPU machines, R performance on different
 hardware.
 (4)finally, a list of the advantages for using R over commercial
 statistical packages. The money-saving in itself is not a reason good
 enough and some people are scared by the lack of 

Re: [R] Reasons to Use R

2007-04-06 Thread bogdan romocea
 (1)Institutions (not only academia) using R

http://www.r-project.org/useR-2006/participants.html

 (2)Hardware requirements, possibly benchmarks

Since you mention huge data sets, GNU/Linux running on 64-bit machines
with as much RAM as your budget allows.

 (3)R  clusters, R  multiple CPU machines,
 R performance on different hardware.

OpenMosix, Quantian for clusters; the archive for multiple CPUs (this
was asked quite a few times). It may be best to measure R performance
on different hardware by yourself, using your own data and code.

 (4)finally, a list of the advantages for using R over
 commercial statistical packages.

I'd say it's not R vs. commercial packages, but S vs. the rest of the
world. Check http://www.insightful.com/ , much of what they say is
applicable to R. Make the case that S is vastly superior directly, not
just through a list of reasons: take a few data sets and show how they
can be analyzed with S compared to other choices. Both R and S-Plus
are likely to significantly outperform most other software, depending
on the kind of work that needs to be done.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Lorenzo Isella
 Sent: Thursday, April 05, 2007 11:02 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Reasons to Use R

 Dear All,
 The institute I work for is organizing an internal workshop for High
 Performance Computing (HPC).
 I am planning to attend it and talk a bit about fluid dynamics, but
 there is also quite a lot of interest devoted to data post-processing
 and management of huge data sets.
 A lot of people are interested in image processing/pattern recognition
 and statistic applied to geography/ecology, but I would like not to
 post this on too many lists.
 The final aim of the workshop is  understanding hardware requirements
 and drafting a list of the equipment we would like to buy. I think
 this could be the venue to talk about R as well.
 Therefore, even if it is not exactly a typical mailing list question,
 I would like to have suggestions about where to collect info about:
 (1)Institutions (not only academia) using R
 (2)Hardware requirements, possibly benchmarks
 (3)R  clusters, R  multiple CPU machines, R performance on
 different hardware.
 (4)finally, a list of the advantages for using R over commercial
 statistical packages. The money-saving in itself is not a reason good
 enough and some people are scared by the lack of professional support,
 though this mailing list is simply wonderful.

 Kind Regards

 Lorenzo Isella

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-06 Thread Roland Rau
Hi Lorenzo,

On 4/5/07, Lorenzo Isella [EMAIL PROTECTED] wrote:

 I would like to have suggestions about where to collect info about:
 (1)Institutions (not only academia) using R


A starting point might be to look at the R-project homepage and look at the
members and donors list. This is, of course, not a comprehensive list; but
at least it can give an overview in which diverse backgrounds people are
using R --- even if it is only the tip of the iceberg.

(2)Hardware requirements, possibly benchmarks


Maybe you should also mention that you can run just from a USB stick if you
want (See R for Windows FAQ 2.6).


(3)R  clusters, R  multiple CPU machines, R performance on different
 hardware.


Have a look a the 'R Administration and Installation' manual; it gives a
nice overview on how many platforms are is running.

Best,
Roland

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-06 Thread Ramon Diaz-Uriarte
Dear Lorenzo,

I'll try not to repeat what other have answered before.

On 4/5/07, Lorenzo Isella [EMAIL PROTECTED] wrote:
 The institute I work for is organizing an internal workshop for High
 Performance Computing (HPC).
(...)

 (1)Institutions (not only academia) using R

You can count my institution too. Several groups. (I can provide more
details off-list if you want).

 (2)Hardware requirements, possibly benchmarks
 (3)R  clusters, R  multiple CPU machines, R performance on different 
 hardware.

We do use R in commodity off-the shelf clusters; our two clusters are
running Debian GNU/Linux; both 32-bit machines ---Xeons--- and 64-bit
machines ---dual-core AMD Opterons. We use parallelization quite a
bit, with MPI (via Rmpi and papply packages mainly). One convenient
feature is that (once the lam universe is up and running) whether we
are using the 4 cores in a single box, or the max available 120, is
completeley transparent. Using R and MPI is, really, a piece of cake.
That said, there are things that I miss; in particular, oftentimes I
wish R were Erlang or Oz because of the straightforward fault-tolerant
distributed computing and the built-in abstractions for distribution
and concurrency. The issue of multithreading has come up several times
in this list and is something that some people miss.

I am not sure how much R is used in the usual HPC realms. It is my
understanding that the traditional HPC is still dominated by things
such as HPF, and C with MPI, OpenMP, or UPC or Cilk. The usual answer
to but R is too slow is but you can write Fortran or C code for the
bottlenecks and call it from R. I guess you could use, say, UPC in
that C that is linked to R, but I have no experience. And I think this
code can become a pain to write and maintain (specially if you want to
play around with what you try to parallelize, etc). My feeling (based
on no information or documentation whatsoever) is that how far R can
be stretched or extended into HPC is still an open question.


 (4)finally, a list of the advantages for using R over commercial
 statistical packages. The money-saving in itself is not a reason good
 enough and some people are scared by the lack of professional support,
 though this mailing list is simply wonderful.


(In addition to all the already mentioned answers)
Complete source code availability. Being able to look at the C source
code for a few things has been invaluable for me.
And, of course, and extremely active, responsive, and vibrant
community that, among other things, has contributed packages and code
for an incredible range of problems.


Best,

R.

P.S. I'd be interested in hearing about the responses you get to your
presentation.


 Kind Regards

 Lorenzo Isella

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-06 Thread Wilfred Zegwaard
Dear Lorenzo and Steven,

I'm not a programmer, but I have the experience that R is good for
processing large datasets, especially in combination with specialised
statistics. There are some limits to that, but R handles large datasets
/ complicated computation a lot better that SPSS for example. I cannot
speak of Fortran, but I have the experience of Pascal. I prefer R,
because in Pascal you become easily confused an endless programming
effort which has nothing to do with the problem. I do like Pascal, it's
the only programming language I actually learned, but it isn't an
adequate replacement of R.
The experience I have is that the SPSS language, and menu-driven
package, is far easier to handle than R, but when it comes to specific
computations, SPSS loses it, by far. Non-parametrics is good in R, e.g.
Dataset handling is adequate (my SPSS ports can be read), I noticed that
R has good numerical routines like optimisation (even mixed integer
programming), good procedures for regression (GLM, which is not an SPSS
standard). Try to compute a Kendall-W statistic in SPSS. It's relatively
easy in R.
The only thing that I DON'T like about R is dataset computations and
it's syntax. When I have a dataset with only non-parametric content
which is also dirty (dataset is incomplete / wrong value), I have to
call in almost a technician how to do that. To be honest: I use a
spreadsheet for these dataset computations, and then export it to R. But
I noted in R there are several solutions for that. With SciViews I could
get a basic feeling for it.
Pascal is basically the only programming language that I syntactically
understood. It had a kind of logical mathematical structure to it. The
logic of Fortran (and to some extent R): I completely miss it.

Statistically: R is my choice, and luckely most procedures in R are
easily accessible. And my experience with computations in R are... good.

I have done in the past simulations, especially with time-series, but I
cannot recommend R for it (arima.sim is not sufficient for these types
of simulations). I still would prefer Pascal for it. There is also an
excellent open source package for Pascal: Free Pascal, but I hardly use
it. I do have some good experiences with computations in C, but little
experience. Instead of C I would prefer R, I believe.

Cheers,

Wilfred

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-06 Thread Stephen Tucker
Regarding (2),

I wonder if this information is too outdated or not relevant when scaled up
to larger problems...

http://www.sciviews.org/benchmark/index.html




--- Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote:

 Dear Lorenzo,
 
 I'll try not to repeat what other have answered before.
 
 On 4/5/07, Lorenzo Isella [EMAIL PROTECTED] wrote:
  The institute I work for is organizing an internal workshop for High
  Performance Computing (HPC).
 (...)
 
  (1)Institutions (not only academia) using R
 
 You can count my institution too. Several groups. (I can provide more
 details off-list if you want).
 
  (2)Hardware requirements, possibly benchmarks
  (3)R  clusters, R  multiple CPU machines, R performance on different
 hardware.
 
 We do use R in commodity off-the shelf clusters; our two clusters are
 running Debian GNU/Linux; both 32-bit machines ---Xeons--- and 64-bit
 machines ---dual-core AMD Opterons. We use parallelization quite a
 bit, with MPI (via Rmpi and papply packages mainly). One convenient
 feature is that (once the lam universe is up and running) whether we
 are using the 4 cores in a single box, or the max available 120, is
 completeley transparent. Using R and MPI is, really, a piece of cake.
 That said, there are things that I miss; in particular, oftentimes I
 wish R were Erlang or Oz because of the straightforward fault-tolerant
 distributed computing and the built-in abstractions for distribution
 and concurrency. The issue of multithreading has come up several times
 in this list and is something that some people miss.
 
 I am not sure how much R is used in the usual HPC realms. It is my
 understanding that the traditional HPC is still dominated by things
 such as HPF, and C with MPI, OpenMP, or UPC or Cilk. The usual answer
 to but R is too slow is but you can write Fortran or C code for the
 bottlenecks and call it from R. I guess you could use, say, UPC in
 that C that is linked to R, but I have no experience. And I think this
 code can become a pain to write and maintain (specially if you want to
 play around with what you try to parallelize, etc). My feeling (based
 on no information or documentation whatsoever) is that how far R can
 be stretched or extended into HPC is still an open question.
 
 
  (4)finally, a list of the advantages for using R over commercial
  statistical packages. The money-saving in itself is not a reason good
  enough and some people are scared by the lack of professional support,
  though this mailing list is simply wonderful.
 
 
 (In addition to all the already mentioned answers)
 Complete source code availability. Being able to look at the C source
 code for a few things has been invaluable for me.
 And, of course, and extremely active, responsive, and vibrant
 community that, among other things, has contributed packages and code
 for an incredible range of problems.
 
 
 Best,
 
 R.
 
 P.S. I'd be interested in hearing about the responses you get to your
 presentation.
 
 
  Kind Regards
 
  Lorenzo Isella
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 Ramon Diaz-Uriarte
 Statistical Computing Team
 Structural Biology and Biocomputing Programme
 Spanish National Cancer Centre (CNIO)
 http://ligarto.org/rdiaz
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 



 

TV dinner still cooling?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-05 Thread Schmitt, Corinna
Dear Mr. Isella,

I just started my PhD Thesis. I need to work with R. Good sources are 
Bioconductor (www.bioconductor.org). It is a DB based on R-programming.
Another institute which has good experiences with R is the HKI in Jena, 
Germany. Perhaps you can contact Mrs. Radke to get more information or speakers 
for your workshop. Both parties are mainly for bioinformatics methods but 
perhaps can help you.

A good reason to use R is that computations are much quicker and you can 
import/export from many other programs or languages files.

Happy Easter,
C.Schmitt

**
Corinna Schmitt, Dipl.Inf.(Bioinformatik)
Fraunhofer Institut für Grenzflächen-  Bioverfahrenstechnik
Nobelstrasse 12, B 3.24
70569 Stuttgart
Germany

phone: +49 711 9704044 
fax: +49 711 9704200
e-mail: [EMAIL PROTECTED]
http://www.igb.fraunhofer.de

 

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Lorenzo Isella
Gesendet: Donnerstag, 5. April 2007 17:02
An: r-help@stat.math.ethz.ch
Betreff: [R] Reasons to Use R

Dear All,
The institute I work for is organizing an internal workshop for High
Performance Computing (HPC).
I am planning to attend it and talk a bit about fluid dynamics, but
there is also quite a lot of interest devoted to data post-processing
and management of huge data sets.
A lot of people are interested in image processing/pattern recognition
and statistic applied to geography/ecology, but I would like not to
post this on too many lists.
The final aim of the workshop is  understanding hardware requirements
and drafting a list of the equipment we would like to buy. I think
this could be the venue to talk about R as well.
Therefore, even if it is not exactly a typical mailing list question,
I would like to have suggestions about where to collect info about:
(1)Institutions (not only academia) using R
(2)Hardware requirements, possibly benchmarks
(3)R  clusters, R  multiple CPU machines, R performance on different hardware.
(4)finally, a list of the advantages for using R over commercial
statistical packages. The money-saving in itself is not a reason good
enough and some people are scared by the lack of professional support,
though this mailing list is simply wonderful.

Kind Regards

Lorenzo Isella

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R

2007-04-05 Thread John Kane

--- Lorenzo Isella [EMAIL PROTECTED] wrote:


 (4)finally, a list of the advantages for using R
 over commercial
 statistical packages. The money-saving in itself is
 not a reason good
 enough and some people are scared by the lack of
 professional support,
 though this mailing list is simply wonderful.

Given that I can do as much if not more with R (in
most cases) than with commercial software, as an
independent consultant,  'cost' is a very significant
factor. 

A very major advantage of R is the money-saving.  Have
a look at
http://www.spss.com/stores/1/Software_Full_Version_C2.cfm

 and convince me that cost ( for an independent
contractor) is not a good reason.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.