[R] To improve my understanding of workspaces
Hello. I have grown accustomed to the .Data directory in S-Plus and so when I came to R I continued that behaviour by saving my workspaces at the end of each R session. So, I have saved workspaces in various directories where I have used R just as I would have had various .Data directories where I had used S-Plus. I have seen comments on the list, most recently from Prof. Ripley that they don't routinely save their workspaces in this way. So my questions are: 1. What do people do instead to manage projects? 2. Is there an official recommendation? From my reading I have learned that you can save data frames (and other objects?) to disk and then attach them. Does this save memory? If I have read correctly, I understand that everything in the workspace is in memory, but haven't been able to determine if objects in the search path are as well. Kind Regards, Kevin -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Department of Public Health Sciences Faculty of Medicine, University of Toronto email: [EMAIL PROTECTED] Tel: 416.946.8081 Fax: 416.946.3297 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] To improve my understanding of workspaces
I use emacs and ESS to develop the scripts. The new releases of R has the script function already in built. Typically I keep all the data and scripts related to a project in its own folder, so I have minimal worry about paths. To save large and associated objects, I use save(x, y, z, file=lala.rda, compress=TRUE) and then to load x, y, z in another session or workspace I use load(lala.rda) To save small dataframes and matrices, I use write.table(mat, file=lala.txt, sep=\t) and to read it back I use mat - read.delim(file=lala.txt, row.names=1) The problem with .RData (via quit or save.image), is that it keeps all intermediate objects which can be unnecessarily bloated and confusing. Further you will have difficulty distinguishing one .RData from the other by looking at the filename alone. Regards, Adai On Fri, 2006-03-10 at 06:58 -0500, Kevin E. Thorpe wrote: Hello. I have grown accustomed to the .Data directory in S-Plus and so when I came to R I continued that behaviour by saving my workspaces at the end of each R session. So, I have saved workspaces in various directories where I have used R just as I would have had various .Data directories where I had used S-Plus. I have seen comments on the list, most recently from Prof. Ripley that they don't routinely save their workspaces in this way. So my questions are: 1. What do people do instead to manage projects? 2. Is there an official recommendation? From my reading I have learned that you can save data frames (and other objects?) to disk and then attach them. Does this save memory? If I have read correctly, I understand that everything in the workspace is in memory, but haven't been able to determine if objects in the search path are as well. Kind Regards, Kevin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] To improve my understanding of workspaces
Other than Emacs, I use the same work habits as Adai. An advantage of this workflow is that almost everything is stored in text format, so it is easy to compare different versions to see what has changed, and it works very well with version control (I use Subversion). The only thing I'd add to his recommendation is that you be sure to save the scripts that produced the objects in the binary images (his lala.rda), so that they can be reconstructed if necessary. As long as the reconstruction isn't too difficult, this means I don't need to bother to save them in Subversion. Duncan Murdoch On 3/10/2006 8:25 AM, Adaikalavan Ramasamy wrote: I use emacs and ESS to develop the scripts. The new releases of R has the script function already in built. Typically I keep all the data and scripts related to a project in its own folder, so I have minimal worry about paths. To save large and associated objects, I use save(x, y, z, file=lala.rda, compress=TRUE) and then to load x, y, z in another session or workspace I use load(lala.rda) To save small dataframes and matrices, I use write.table(mat, file=lala.txt, sep=\t) and to read it back I use mat - read.delim(file=lala.txt, row.names=1) The problem with .RData (via quit or save.image), is that it keeps all intermediate objects which can be unnecessarily bloated and confusing. Further you will have difficulty distinguishing one .RData from the other by looking at the filename alone. Regards, Adai On Fri, 2006-03-10 at 06:58 -0500, Kevin E. Thorpe wrote: Hello. I have grown accustomed to the .Data directory in S-Plus and so when I came to R I continued that behaviour by saving my workspaces at the end of each R session. So, I have saved workspaces in various directories where I have used R just as I would have had various .Data directories where I had used S-Plus. I have seen comments on the list, most recently from Prof. Ripley that they don't routinely save their workspaces in this way. So my questions are: 1. What do people do instead to manage projects? 2. Is there an official recommendation? From my reading I have learned that you can save data frames (and other objects?) to disk and then attach them. Does this save memory? If I have read correctly, I understand that everything in the workspace is in memory, but haven't been able to determine if objects in the search path are as well. Kind Regards, Kevin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] To improve my understanding of workspaces
Thanks Adai. A couple questions/comments about this. Adaikalavan Ramasamy wrote: I use emacs and ESS to develop the scripts. The new releases of R has the script function already in built. I use emacs and ESS too (in Linux). I do not know about the script function you mention. It's not in my version (2.1.1) and I couldn't find it in an RSiteSearch either. Typically I keep all the data and scripts related to a project in its own folder, so I have minimal worry about paths. I do the same. To save large and associated objects, I use save(x, y, z, file=lala.rda, compress=TRUE) and then to load x, y, z in another session or workspace I use load(lala.rda) To save small dataframes and matrices, I use write.table(mat, file=lala.txt, sep=\t) and to read it back I use mat - read.delim(file=lala.txt, row.names=1) Am I correct that load() or read.whatever() or even data() will bring the objects into the current workspace while attach() can attach a save() data frame to the search path? Is one approach better than the other in general? The problem with .RData (via quit or save.image), is that it keeps all intermediate objects which can be unnecessarily bloated and confusing. Further you will have difficulty distinguishing one .RData from the other by looking at the filename alone. If you don't save the workspace on q(), do you also lose the history for that session (although when working in emacs, this is rarely a problem)? Regards, Adai Thanks again, Kevin On Fri, 2006-03-10 at 06:58 -0500, Kevin E. Thorpe wrote: Hello. I have grown accustomed to the .Data directory in S-Plus and so when I came to R I continued that behaviour by saving my workspaces at the end of each R session. So, I have saved workspaces in various directories where I have used R just as I would have had various .Data directories where I had used S-Plus. I have seen comments on the list, most recently from Prof. Ripley that they don't routinely save their workspaces in this way. So my questions are: 1. What do people do instead to manage projects? 2. Is there an official recommendation? From my reading I have learned that you can save data frames (and other objects?) to disk and then attach them. Does this save memory? If I have read correctly, I understand that everything in the workspace is in memory, but haven't been able to determine if objects in the search path are as well. Kind Regards, Kevin -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Department of Public Health Sciences Faculty of Medicine, University of Toronto email: [EMAIL PROTECTED] Tel: 416.946.8081 Fax: 416.946.3297 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] To improve my understanding of workspaces
I On 3/10/06 8:33 AM, Duncan Murdoch [EMAIL PROTECTED] wrote: Other than Emacs, I use the same work habits as Adai. An advantage of this workflow is that almost everything is stored in text format, so it is easy to compare different versions to see what has changed, and it works very well with version control (I use Subversion). The only thing I'd add to his recommendation is that you be sure to save the scripts that produced the objects in the binary images (his lala.rda), so that they can be reconstructed if necessary. As long as the reconstruction isn't too difficult, this means I don't need to bother to save them in Subversion. I would add a bit of detail here that I do. ESS/xemacs allows one to create a transcript file that you can then step through, executing each command as it was originally executed. I make one of these transcript files for each project and save it with the data and any scripts that I have for the project. So, in the end, I have a set of Rda files, one or more transcript files, and a Src directory that contains any function code (and ESS supports saving scripts to this directory automatically). Sean __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] To improve my understanding of workspaces
A lot of programming style are personal choices and as such varies from individual to individual. See my comments below. On Fri, 2006-03-10 at 09:01 -0500, Kevin E. Thorpe wrote: Thanks Adai. A couple questions/comments about this. Adaikalavan Ramasamy wrote: I use emacs and ESS to develop the scripts. The new releases of R has the script function already in built. I use emacs and ESS too (in Linux). I do not know about the script function you mention. It's not in my version (2.1.1) and I couldn't find it in an RSiteSearch either. I meant to say in newer releases of R _for Windows only_ has script function. Look under File-New scripts (untested). But however it does not appear to have syntax highlighting or auto indenting that emacs has. Typically I keep all the data and scripts related to a project in its own folder, so I have minimal worry about paths. I do the same. To save large and associated objects, I use save(x, y, z, file=lala.rda, compress=TRUE) and then to load x, y, z in another session or workspace I use load(lala.rda) To save small dataframes and matrices, I use write.table(mat, file=lala.txt, sep=\t) and to read it back I use mat - read.delim(file=lala.txt, row.names=1) Am I correct that load() or read.whatever() or even data() will bring the objects into the current workspace while attach() can attach a save() data frame to the search path? Is one approach better than the other in general? I think you are correct. The attach function appears to have two functions now : a) attach(lala.rda) loads objects from lala.rda into the search path b) attach(obj) makes the named columns of a dataframe or list available in the search path. Therefore you only need to type 'aaa' instead of obj$aaa or obj[ , aaa] The second is the more popular form of usage. Personally I would rather not use attach() and prefer to type obj$aaa or use in the context of lm( aaa ~ ., data=obj ). The problem with .RData (via quit or save.image), is that it keeps all intermediate objects which can be unnecessarily bloated and confusing. Further you will have difficulty distinguishing one .RData from the other by looking at the filename alone. If you don't save the workspace on q(), do you also lose the history for that session (although when working in emacs, this is rarely a problem)? I would argue that script file is a better way than history files because I can clean up any test or wrong codes I might have in the script file. However if you prefer to save the history, you can use savehistory(file=history.txt) at any point Regards, Adai SNIP __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] To improve my understanding of workspaces
Sean Davis wrote: On 3/10/06 8:33 AM, Duncan Murdoch [EMAIL PROTECTED] wrote: Other than Emacs, I use the same work habits as Adai. An advantage of this workflow is that almost everything is stored in text format, so it is easy to compare different versions to see what has changed, and it works very well with version control (I use Subversion). The only thing I'd add to his recommendation is that you be sure to save the scripts that produced the objects in the binary images (his lala.rda), so that they can be reconstructed if necessary. As long as the reconstruction isn't too difficult, this means I don't need to bother to save them in Subversion. Version control sounds like a good idea Duncan, but I've always been a bit intimidated by it. How cumbersome is Subversion and what are the advantages of version control? I would add a bit of detail here that I do. ESS/xemacs allows one to create a transcript file that you can then step through, executing each command as it was originally executed. I make one of these transcript files for each project and save it with the data and any scripts that I have for the project. So, in the end, I have a set of Rda files, one or more transcript files, and a Src directory that contains any function code (and ESS supports saving scripts to this directory automatically). Do you save your functions in Rda files to be loaded/attached or are they sourced every time? How do you tell ESS/emacs to save in ./src or is that only possible with xemacs (I can use emacs to do what I need to but don't know lisp so the config files and terminology are a bit cryptic to me)? Kevin -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Department of Public Health Sciences Faculty of Medicine, University of Toronto email: [EMAIL PROTECTED] Tel: 416.946.8081 Fax: 416.946.3297 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] To improve my understanding of workspaces
On 3/10/06 1:53 PM, Kevin E. Thorpe [EMAIL PROTECTED] wrote: Sean Davis wrote: On 3/10/06 8:33 AM, Duncan Murdoch [EMAIL PROTECTED] wrote: Other than Emacs, I use the same work habits as Adai. An advantage of this workflow is that almost everything is stored in text format, so it is easy to compare different versions to see what has changed, and it works very well with version control (I use Subversion). The only thing I'd add to his recommendation is that you be sure to save the scripts that produced the objects in the binary images (his lala.rda), so that they can be reconstructed if necessary. As long as the reconstruction isn't too difficult, this means I don't need to bother to save them in Subversion. Version control sounds like a good idea Duncan, but I've always been a bit intimidated by it. How cumbersome is Subversion and what are the advantages of version control? I would add a bit of detail here that I do. ESS/xemacs allows one to create a transcript file that you can then step through, executing each command as it was originally executed. I make one of these transcript files for each project and save it with the data and any scripts that I have for the project. So, in the end, I have a set of Rda files, one or more transcript files, and a Src directory that contains any function code (and ESS supports saving scripts to this directory automatically). Do you save your functions in Rda files to be loaded/attached or are they sourced every time? How do you tell ESS/emacs to save in ./src or is that only possible with xemacs (I can use emacs to do what I need to but don't know lisp so the config files and terminology are a bit cryptic to me)? I tend to save as source for easier reading and sharing among projects. I should begin to use SVN for my smaller projects, but I haven't yet--only for packages meant for release or future release make it into SVN with me. SVN is quite easy to use and there is at least one emacs package that allows SVN version control from within emacs (although I do it from the command-line, still). As for your second question: (setq ess-source-directory (lambda () (concat ess-directory Src/))) is what I use. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] To improve my understanding of workspaces
On 3/10/2006 1:53 PM, Kevin E. Thorpe wrote: Sean Davis wrote: On 3/10/06 8:33 AM, Duncan Murdoch [EMAIL PROTECTED] wrote: Other than Emacs, I use the same work habits as Adai. An advantage of this workflow is that almost everything is stored in text format, so it is easy to compare different versions to see what has changed, and it works very well with version control (I use Subversion). The only thing I'd add to his recommendation is that you be sure to save the scripts that produced the objects in the binary images (his lala.rda), so that they can be reconstructed if necessary. As long as the reconstruction isn't too difficult, this means I don't need to bother to save them in Subversion. Version control sounds like a good idea Duncan, but I've always been a bit intimidated by it. How cumbersome is Subversion and what are the advantages of version control? It needn't be very cumbersome after you've set it up, but the setup would be a bit daunting if you haven't used it before. If you can find someone who has used it before to do the setup for you, you'll find it a lot less intimidating. I'd be happy to do this for you if you come to London for the SSC meeting in May. (This offer doesn't just apply to Kevin, but he's more likely to come to that meeting than most of the readers of this list. If anyone else is interested, drop me a line privately. And remember that's London, Canada, not the other one.) If you're working in Windows, use the TortoiseSVN front-end as well as the command line tools. I started with the command line tools but use TSVN most of the time now. I also recommend reading the O'Reilly book, Version Control with Subversion. It's available online at http://svnbook.red-bean.com/. Duncan Murdoch I would add a bit of detail here that I do. ESS/xemacs allows one to create a transcript file that you can then step through, executing each command as it was originally executed. I make one of these transcript files for each project and save it with the data and any scripts that I have for the project. So, in the end, I have a set of Rda files, one or more transcript files, and a Src directory that contains any function code (and ESS supports saving scripts to this directory automatically). Do you save your functions in Rda files to be loaded/attached or are they sourced every time? How do you tell ESS/emacs to save in ./src or is that only possible with xemacs (I can use emacs to do what I need to but don't know lisp so the config files and terminology are a bit cryptic to me)? Kevin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] To improve my understanding of workspaces
On Fri, 10 Mar 2006, Adaikalavan Ramasamy wrote: The attach function appears to have two functions now : Since R 1.1.0, in fact. a) attach(lala.rda) loads objects from lala.rda into the search path b) attach(obj) makes the named columns of a dataframe or list available in the search path. Therefore you only need to type 'aaa' instead of obj$aaa or obj[ , aaa] The second is the more popular form of usage. Personally I would rather not use attach() and prefer to type obj$aaa or use in the context of lm( aaa ~ ., data=obj ). This distinction is relevant only to the second syntax for attach. Attaching an .rda file is more like loading a package -- it makes the whole object available, and is very similar to attach() in S-PLUS. -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html