[R] To improve my understanding of workspaces

2006-03-10 Thread Kevin E. Thorpe
Hello.

I have grown accustomed to the .Data directory in S-Plus and so when
I came to R I continued that behaviour by saving my workspaces at
the end of each R session.  So, I have saved workspaces in various
directories where I have used R just as I would have had various
.Data directories where I had used S-Plus.

I have seen comments on the list, most recently from Prof. Ripley
that they don't routinely save their workspaces in this way.
So my questions are:

   1. What do people do instead to manage projects?
   2. Is there an official recommendation?

 From my reading I have learned that you can save data frames
(and other objects?) to disk and then attach them.  Does this
save memory?  If I have read correctly, I understand that
everything in the workspace is in memory, but haven't been able
to determine if objects in the search path are as well.

Kind Regards,

Kevin

-- 
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: [EMAIL PROTECTED]  Tel: 416.946.8081  Fax: 416.946.3297

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] To improve my understanding of workspaces

2006-03-10 Thread Adaikalavan Ramasamy
I use emacs and ESS to develop the scripts. The new releases of R has
the script function already in built.

Typically I keep all the data and scripts related to a project in its
own folder, so I have minimal worry about paths.

To save large and associated objects, I use 
   save(x, y, z, file=lala.rda, compress=TRUE) 
and then to load x, y, z in another session or workspace I use
   load(lala.rda) 

To save small dataframes and matrices, I use 
   write.table(mat, file=lala.txt, sep=\t) 
and to read it back I use
   mat - read.delim(file=lala.txt, row.names=1)


The problem with .RData (via quit or save.image), is that it keeps all
intermediate objects which can be unnecessarily bloated and confusing.
Further you will have difficulty distinguishing one .RData from the
other by looking at the filename alone.

Regards, Adai



On Fri, 2006-03-10 at 06:58 -0500, Kevin E. Thorpe wrote:
 Hello.
 
 I have grown accustomed to the .Data directory in S-Plus and so when
 I came to R I continued that behaviour by saving my workspaces at
 the end of each R session.  So, I have saved workspaces in various
 directories where I have used R just as I would have had various
 .Data directories where I had used S-Plus.
 
 I have seen comments on the list, most recently from Prof. Ripley
 that they don't routinely save their workspaces in this way.
 So my questions are:
 
1. What do people do instead to manage projects?
2. Is there an official recommendation?
 
  From my reading I have learned that you can save data frames
 (and other objects?) to disk and then attach them.  Does this
 save memory?  If I have read correctly, I understand that
 everything in the workspace is in memory, but haven't been able
 to determine if objects in the search path are as well.
 
 Kind Regards,
 
 Kevin


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] To improve my understanding of workspaces

2006-03-10 Thread Duncan Murdoch
Other than Emacs, I use the same work habits as Adai.  An advantage of 
this workflow is that almost everything is stored in text format, so it 
is easy to compare different versions to see what has changed, and it 
works very well with version control (I use Subversion).

The only thing I'd add to his recommendation is that you be sure to save 
the scripts that produced the objects in the binary images (his 
lala.rda), so that they can be reconstructed if necessary.  As long as 
the reconstruction isn't too difficult, this means I don't need to 
bother to save them in Subversion.

Duncan Murdoch



On 3/10/2006 8:25 AM, Adaikalavan Ramasamy wrote:
 I use emacs and ESS to develop the scripts. The new releases of R has
 the script function already in built.
 
 Typically I keep all the data and scripts related to a project in its
 own folder, so I have minimal worry about paths.
 
 To save large and associated objects, I use 
save(x, y, z, file=lala.rda, compress=TRUE) 
 and then to load x, y, z in another session or workspace I use
load(lala.rda) 
 
 To save small dataframes and matrices, I use 
write.table(mat, file=lala.txt, sep=\t) 
 and to read it back I use
mat - read.delim(file=lala.txt, row.names=1)
 
 
 The problem with .RData (via quit or save.image), is that it keeps all
 intermediate objects which can be unnecessarily bloated and confusing.
 Further you will have difficulty distinguishing one .RData from the
 other by looking at the filename alone.
 
 Regards, Adai
 
 
 
 On Fri, 2006-03-10 at 06:58 -0500, Kevin E. Thorpe wrote:
 Hello.
 
 I have grown accustomed to the .Data directory in S-Plus and so when
 I came to R I continued that behaviour by saving my workspaces at
 the end of each R session.  So, I have saved workspaces in various
 directories where I have used R just as I would have had various
 .Data directories where I had used S-Plus.
 
 I have seen comments on the list, most recently from Prof. Ripley
 that they don't routinely save their workspaces in this way.
 So my questions are:
 
1. What do people do instead to manage projects?
2. Is there an official recommendation?
 
  From my reading I have learned that you can save data frames
 (and other objects?) to disk and then attach them.  Does this
 save memory?  If I have read correctly, I understand that
 everything in the workspace is in memory, but haven't been able
 to determine if objects in the search path are as well.
 
 Kind Regards,
 
 Kevin

 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] To improve my understanding of workspaces

2006-03-10 Thread Kevin E. Thorpe
Thanks Adai.  A couple questions/comments about this.

Adaikalavan Ramasamy wrote:
 I use emacs and ESS to develop the scripts. The new releases of R has
 the script function already in built.

I use emacs and ESS too (in Linux).  I do not know about the script
function you mention.  It's not in my version (2.1.1) and I couldn't
find it in an RSiteSearch either.

 Typically I keep all the data and scripts related to a project in its
 own folder, so I have minimal worry about paths.

I do the same.

 To save large and associated objects, I use 
save(x, y, z, file=lala.rda, compress=TRUE) 
 and then to load x, y, z in another session or workspace I use
load(lala.rda) 
 
 To save small dataframes and matrices, I use 
write.table(mat, file=lala.txt, sep=\t) 
 and to read it back I use
mat - read.delim(file=lala.txt, row.names=1)

Am I correct that load() or read.whatever() or even data() will
bring the objects into the current workspace while attach() can
attach a save() data frame to the search path?  Is one approach
better than the other in general?

 
 The problem with .RData (via quit or save.image), is that it keeps all
 intermediate objects which can be unnecessarily bloated and confusing.
 Further you will have difficulty distinguishing one .RData from the
 other by looking at the filename alone.

If you don't save the workspace on q(), do you also lose the history for
that session (although when working in emacs, this is rarely a problem)?

 Regards, Adai

Thanks again,

Kevin

 
 
 On Fri, 2006-03-10 at 06:58 -0500, Kevin E. Thorpe wrote:
 
Hello.

I have grown accustomed to the .Data directory in S-Plus and so when
I came to R I continued that behaviour by saving my workspaces at
the end of each R session.  So, I have saved workspaces in various
directories where I have used R just as I would have had various
.Data directories where I had used S-Plus.

I have seen comments on the list, most recently from Prof. Ripley
that they don't routinely save their workspaces in this way.
So my questions are:

   1. What do people do instead to manage projects?
   2. Is there an official recommendation?

 From my reading I have learned that you can save data frames
(and other objects?) to disk and then attach them.  Does this
save memory?  If I have read correctly, I understand that
everything in the workspace is in memory, but haven't been able
to determine if objects in the search path are as well.

Kind Regards,

Kevin

 
 
 


-- 
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: [EMAIL PROTECTED]  Tel: 416.946.8081  Fax: 416.946.3297

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] To improve my understanding of workspaces

2006-03-10 Thread Sean Davis
I


On 3/10/06 8:33 AM, Duncan Murdoch [EMAIL PROTECTED] wrote:

 Other than Emacs, I use the same work habits as Adai.  An advantage of
 this workflow is that almost everything is stored in text format, so it
 is easy to compare different versions to see what has changed, and it
 works very well with version control (I use Subversion).
 
 The only thing I'd add to his recommendation is that you be sure to save
 the scripts that produced the objects in the binary images (his
 lala.rda), so that they can be reconstructed if necessary.  As long as
 the reconstruction isn't too difficult, this means I don't need to
 bother to save them in Subversion.

I would add a bit of detail here that I do.  ESS/xemacs allows one to create
a transcript file that you can then step through, executing each command as
it was originally executed.  I make one of these transcript files for each
project and save it with the data and any scripts that I have for the
project.  So, in the end, I have a set of Rda files, one or more transcript
files, and a Src directory that contains any function code (and ESS supports
saving scripts to this directory automatically).

Sean

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] To improve my understanding of workspaces

2006-03-10 Thread Adaikalavan Ramasamy
A lot of programming style are personal choices and as such varies from
individual to individual. See my comments below.

On Fri, 2006-03-10 at 09:01 -0500, Kevin E. Thorpe wrote:
 Thanks Adai.  A couple questions/comments about this.
 
 Adaikalavan Ramasamy wrote:
  I use emacs and ESS to develop the scripts. The new releases of R has
  the script function already in built.
 
 I use emacs and ESS too (in Linux).  I do not know about the script
 function you mention.  It's not in my version (2.1.1) and I couldn't
 find it in an RSiteSearch either.

I meant to say in newer releases of R _for Windows only_ has script
function. Look under File-New scripts (untested). But however it does
not appear to have syntax highlighting or auto indenting that emacs has.


  Typically I keep all the data and scripts related to a project in its
  own folder, so I have minimal worry about paths.
 
 I do the same.
 
  To save large and associated objects, I use 
 save(x, y, z, file=lala.rda, compress=TRUE) 
  and then to load x, y, z in another session or workspace I use
 load(lala.rda) 
  
  To save small dataframes and matrices, I use 
 write.table(mat, file=lala.txt, sep=\t) 
  and to read it back I use
 mat - read.delim(file=lala.txt, row.names=1)
 
 Am I correct that load() or read.whatever() or even data() will
 bring the objects into the current workspace while attach() can
 attach a save() data frame to the search path?  Is one approach
 better than the other in general?

I think you are correct.

The attach function appears to have two functions now :
 a) attach(lala.rda) loads objects from lala.rda into the search path
 b) attach(obj) makes the named columns of a dataframe or list available
in the search path. Therefore you only need to type 'aaa' instead of 
obj$aaa or obj[ , aaa]

The second is the more popular form of usage. 

Personally I would rather not use attach() and prefer to type obj$aaa or
use in the context of lm( aaa ~ ., data=obj ).



  The problem with .RData (via quit or save.image), is that it keeps all
  intermediate objects which can be unnecessarily bloated and confusing.
  Further you will have difficulty distinguishing one .RData from the
  other by looking at the filename alone.
 
 If you don't save the workspace on q(), do you also lose the history for
 that session (although when working in emacs, this is rarely a problem)?

I would argue that script file is a better way than history files
because I can clean up any test or wrong codes I might have in the
script file.


However if you prefer to save the history, you can use
savehistory(file=history.txt) at any point 

Regards, Adai

SNIP

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] To improve my understanding of workspaces

2006-03-10 Thread Kevin E. Thorpe
Sean Davis wrote:
 
 On 3/10/06 8:33 AM, Duncan Murdoch [EMAIL PROTECTED] wrote:
 
 
Other than Emacs, I use the same work habits as Adai.  An advantage of
this workflow is that almost everything is stored in text format, so it
is easy to compare different versions to see what has changed, and it
works very well with version control (I use Subversion).

The only thing I'd add to his recommendation is that you be sure to save
the scripts that produced the objects in the binary images (his
lala.rda), so that they can be reconstructed if necessary.  As long as
the reconstruction isn't too difficult, this means I don't need to
bother to save them in Subversion.

Version control sounds like a good idea Duncan, but I've always been a
bit intimidated by it.  How cumbersome is Subversion and what are the
advantages of version control?

 
 I would add a bit of detail here that I do.  ESS/xemacs allows one to create
 a transcript file that you can then step through, executing each command as
 it was originally executed.  I make one of these transcript files for each
 project and save it with the data and any scripts that I have for the
 project.  So, in the end, I have a set of Rda files, one or more transcript
 files, and a Src directory that contains any function code (and ESS supports
 saving scripts to this directory automatically).

Do you save your functions in Rda files to be loaded/attached or are
they sourced every time?  How do you tell ESS/emacs to save in ./src or
is that only possible with xemacs (I can use emacs to do what I need to
but don't know lisp so the config files and terminology are a bit
cryptic to me)?

Kevin

-- 
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: [EMAIL PROTECTED]  Tel: 416.946.8081  Fax: 416.946.3297

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] To improve my understanding of workspaces

2006-03-10 Thread Sean Davis



On 3/10/06 1:53 PM, Kevin E. Thorpe [EMAIL PROTECTED] wrote:

 Sean Davis wrote:
 
 On 3/10/06 8:33 AM, Duncan Murdoch [EMAIL PROTECTED] wrote:
 
 
 Other than Emacs, I use the same work habits as Adai.  An advantage of
 this workflow is that almost everything is stored in text format, so it
 is easy to compare different versions to see what has changed, and it
 works very well with version control (I use Subversion).
 
 The only thing I'd add to his recommendation is that you be sure to save
 the scripts that produced the objects in the binary images (his
 lala.rda), so that they can be reconstructed if necessary.  As long as
 the reconstruction isn't too difficult, this means I don't need to
 bother to save them in Subversion.
 
 Version control sounds like a good idea Duncan, but I've always been a
 bit intimidated by it.  How cumbersome is Subversion and what are the
 advantages of version control?
 
 
 I would add a bit of detail here that I do.  ESS/xemacs allows one to create
 a transcript file that you can then step through, executing each command as
 it was originally executed.  I make one of these transcript files for each
 project and save it with the data and any scripts that I have for the
 project.  So, in the end, I have a set of Rda files, one or more transcript
 files, and a Src directory that contains any function code (and ESS supports
 saving scripts to this directory automatically).
 
 Do you save your functions in Rda files to be loaded/attached or are
 they sourced every time?  How do you tell ESS/emacs to save in ./src or
 is that only possible with xemacs (I can use emacs to do what I need to
 but don't know lisp so the config files and terminology are a bit
 cryptic to me)?

I tend to save as source for easier reading and sharing among projects.  I
should begin to use SVN for my smaller projects, but I haven't yet--only for
packages meant for release or future release make it into SVN with me.  SVN
is quite easy to use and there is at least one emacs package that allows SVN
version control from within emacs (although I do it from the command-line,
still).

As for your second question:

(setq ess-source-directory
  (lambda ()
(concat ess-directory Src/)))

is what I use.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] To improve my understanding of workspaces

2006-03-10 Thread Duncan Murdoch
On 3/10/2006 1:53 PM, Kevin E. Thorpe wrote:
 Sean Davis wrote:
 
 On 3/10/06 8:33 AM, Duncan Murdoch [EMAIL PROTECTED] wrote:
 
 
Other than Emacs, I use the same work habits as Adai.  An advantage of
this workflow is that almost everything is stored in text format, so it
is easy to compare different versions to see what has changed, and it
works very well with version control (I use Subversion).

The only thing I'd add to his recommendation is that you be sure to save
the scripts that produced the objects in the binary images (his
lala.rda), so that they can be reconstructed if necessary.  As long as
the reconstruction isn't too difficult, this means I don't need to
bother to save them in Subversion.
 
 Version control sounds like a good idea Duncan, but I've always been a
 bit intimidated by it.  How cumbersome is Subversion and what are the
 advantages of version control?

It needn't be very cumbersome after you've set it up, but the setup 
would be a bit daunting if you haven't used it before.  If you can find 
someone who has used it before to do the setup for you, you'll find it a 
lot less intimidating.  I'd be happy to do this for you if you come to 
London for the SSC meeting in May.  (This offer doesn't just apply to 
Kevin, but he's more likely to come to that meeting than most of the 
readers of this list.  If anyone else is interested, drop me a line 
privately.  And remember that's London, Canada, not the other one.)

If you're working in Windows, use the TortoiseSVN front-end as well as 
the command line tools.  I started with the command line tools but use 
TSVN most of the time now.

I also recommend reading the O'Reilly book, Version Control with 
Subversion.  It's available online at http://svnbook.red-bean.com/.

Duncan Murdoch
 
 
 I would add a bit of detail here that I do.  ESS/xemacs allows one to create
 a transcript file that you can then step through, executing each command as
 it was originally executed.  I make one of these transcript files for each
 project and save it with the data and any scripts that I have for the
 project.  So, in the end, I have a set of Rda files, one or more transcript
 files, and a Src directory that contains any function code (and ESS supports
 saving scripts to this directory automatically).
 
 Do you save your functions in Rda files to be loaded/attached or are
 they sourced every time?  How do you tell ESS/emacs to save in ./src or
 is that only possible with xemacs (I can use emacs to do what I need to
 but don't know lisp so the config files and terminology are a bit
 cryptic to me)?
 
 Kevin


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] To improve my understanding of workspaces

2006-03-10 Thread Thomas Lumley
On Fri, 10 Mar 2006, Adaikalavan Ramasamy wrote:

 The attach function appears to have two functions now :

Since R 1.1.0, in fact.

 a) attach(lala.rda) loads objects from lala.rda into the search path
 b) attach(obj) makes the named columns of a dataframe or list available
 in the search path. Therefore you only need to type 'aaa' instead of
 obj$aaa or obj[ , aaa]

 The second is the more popular form of usage.

 Personally I would rather not use attach() and prefer to type obj$aaa or
 use in the context of lm( aaa ~ ., data=obj ).

This distinction is relevant only to the second syntax for attach. 
Attaching an .rda file is more like loading a package -- it makes the 
whole object available, and is very similar to attach() in S-PLUS.

-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html