Re: [R] simultaneous computing
Dear Markus, You might want to check Rmpi, papply, snow, rpvm, and nws. Best, R. On 6/11/07, Markus Schmidberger [EMAIL PROTECTED] wrote: Hello, which possibilities are available in R for simultaneous or parallel computing? I only could find biopara (http://cran.r-project.org/src/contrib/Descriptions/biopara.html) Are there other possibilities? Are there special groups working on simultaneous computing with R? Thanks Markus -- Dipl.-Tech. Math. Markus Schmidberger Ludwig-Maximilians-Universität München IBE - Institut für medizinische Informationsverarbeitung, Biometrie und Epidemiologie __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trouble with snow and Rmpi
Dear Erin, What operating system are you trying this on? Windows? In Linux you definitely don't need MPICH2 but, rather, LAM/MPI. Best, R. On 5/25/07, Erin Hodgess [EMAIL PROTECTED] wrote: Dear R People: I am having some trouble with the snow package. It requires MPICH2 and Rmpi. Rmpi is fine. However, I downloaded the MPICH2 package, and installed. There is no mpicc, mpirun, etc. Does anyone have any suggestions, please? Thanks in advance! Sincerely, Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rserve and R to R communication
On 4/11/07, AJ Rossini [EMAIL PROTECTED] wrote: On Tuesday 10 April 2007 23:17, Ramon Diaz-Uriarte wrote: Of course, you are right there. I think that might still be the case. At the time we made our decision, and decided to go for MPI, MPI 2 was already out, and MPI seemed more like the current/future standard than PVM. That's always been the case. In fact MPI is a standard, where as PVM always was an implementation defining a so-called standard. Ooops, you are right. But in addition to whether or not a standard, it seemed (and still seems) that MPI is the current/future stuff whereas PVM seemed more like a useful but aging approach. (I am aging too, so maybe that ain't that good an argument :-). So using papply with Rmpi requires sharper programmers than using snow? Hey, it is good to know I am that smarter. I'll wear that as a badge :-). You are! I've never been patient enough to use plain Rmpi or rpvm except a few times, but for me, the advantage of snow is that you get all the Oh, but except for a few very simple things such as broadcasting data or functions to all the slaves, or cleaning up, I never use Rmpi directly. I always use papply, which is, really, a piece of cake. I am just scratching the surface of this parallelism stuff, and I am sticking to the simple embarrasingly parallelizable problems (cross-validation, bootstrap, identical analysis on many samples, etc). So going any deeper into MPI (individual sends, receives, etc) was more trouble than it seemed worth. papply or, alternatively, clusterApplyLB, are all I've (almost ever) needed/used. backends, not just MPI. In fact, I've heard mention that some folks are sticking together a NWS backend as well. Anyway, papply (with Rmpi) is not, in my experience, any harder than snow (with either rpvm or Rmpi). In fact, I find papply a lot simpler than snow (clusterApply and clusterApplyLB). For one thing, debugging is very simple, since papply becomes lapply if no lam universe is booted. In fact it might be easier, since we never put together decent aggregation routines. (smarter doesn't mean works harder, just more intelligently :-). I'll take that as a compliment :-). I see, though, that I might want to check PVM just for the sake of the fault tolerance in snowFT. Fault tolerance is one of those very ill-defined words. Specifically: #1 - mapping pRNG streams to work units, not just CPUs or dispatch order (both of which can differ), for reproducibility #2 - handling failure to complete on worker nodes gracefully. However, you'd need checkpointing or probably a miracle to handle failure on the master... Aha, I hadn't thought of #1, beings as I am much more concerned about #2. (For #1, and to check results, I tend to run things under controlled conditions, where if a worker shuts down, I'll bring it back to life, and start again ---not elegant, but this happens rarely enough that I don't worry too much). Right now, I am dealing with #2 via additional external scripts that check that LAM universes are up, examine log files for signs of failures, modify lamb host definition files if needed, restart LAM universes, etc, and with checkpointing in the R code. But I think it is an ugly kludge (and a pain). I envy the Erlang guys. As for failure in the master ... I'll take that as an act of god, so no point in trying to defeat it via miracles :-). Actually, the scripts above could be distributed (the checkpointing is done from the master), so this is doable via a meta script that runs distributed. I've just added that to the to-do list. Best, R. best, -tony [EMAIL PROTECTED] Muttenz, Switzerland. Commit early,commit often, and commit in a repository from which we can easily roll-back your mistakes (AJR, 4Jan05). -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rserve and R to R communication
On 4/10/07, AJ Rossini [EMAIL PROTECTED] wrote: On Monday 09 April 2007 23:02, Ramon Diaz-Uriarte wrote: (Yes, maybe I should check snowFT, but it uses PVM, and I recall a while back there was a reason why we decided to go with MPI instead of PVM). There is no reason that you can't run both MPI and PVM on the same cluster. Yes, sure. We actually did that for a while. But we eventually settled on MPI. There is a particular reason that the first implementation we (Na Li, who did most of the work, and myself) made used PVM -- at the time (pre MPI 2) it was far more advanced than MPI as far as interactive parallel computing, i.e. dispatch parallel functions interactively from the command line, creating and manipulating virtual machines on the fly. Of course, you are right there. I think that might still be the case. At the time we made our decision, and decided to go for MPI, MPI 2 was already out, and MPI seemed more like the current/future standard than PVM. A feeling that was reinforced by seeing some key people of PVM (e.g., Dongarra) also involved in MPI, as well as very active development of MPI (e.g., LAM, mpich, and later OpenMPI). And MPI seemed more like the usual message passing (which for us was, at that time at least, a good thing). And we were also using MPI in C++ code. So we decided to bet on MPI. Of course, most MPI implementations will save you loads of deci-seconds on transfer of medium size messages over the wire, but we weren't interested in Oh, but those deci-seconds were never the reason we decided to choose MPI. We are using R after all, not HPF :-). that particular aspect, more in saving days over the course of a one-off program (i.e. development time, which can be more painful that run-time). Right. And of course we never thought MPI would cost us significantly more development time than PVM (and that the increased development time would be compensated by the above mentioned deci-seconds). Moreover, most of these are not one-off programs, but web applications (some of which have been running for over two years) where easy debugging is crucial for us if we have to revisit the code 6 months later (and for that we found papply quite more useful than snow ---more below). Now, PVM had the necessary tools for fault tolerance -- though I thought that the recent MPI and newer message passing frameworks might have had some of that implemented. Some MPIs have been developed that incorporate it. But I do not think that is easy with LAM/MPI nor via Rmpi. The problem is that once a node goes down, the whole LAM universe gets screwed up. And remember, the point of snow was to provide platform-independent parallel code (for which it was the first, for nearly any language/implementation), not to run it like a bat-out-of-hell... (we assumed it would be cheaper to buy more machines than to spend a few months finding a budget along with sharp programmers). So using papply with Rmpi requires sharper programmers than using snow? Hey, it is good to know I am that smarter. I'll wear that as a badge :-). Anyway, papply (with Rmpi) is not, in my experience, any harder than snow (with either rpvm or Rmpi). In fact, I find papply a lot simpler than snow (clusterApply and clusterApplyLB). For one thing, debugging is very simple, since papply becomes lapply if no lam universe is booted. I see, though, that I might want to check PVM just for the sake of the fault tolerance in snowFT. Best, R. best, -tony [EMAIL PROTECTED] Muttenz, Switzerland. Commit early,commit often, and commit in a repository from which we can easily roll-back your mistakes (AJR, 4Jan05). -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rserve and R to R communication
On 4/9/07, Simon Urbanek [EMAIL PROTECTED] wrote: On Apr 7, 2007, at 10:56 AM, Ramon Diaz-Uriarte wrote: Dear All, The clients.txt file of the latest Rserve package, by Simon Urbanek, says, regarding its R client, (...) a simple R client, i.e. it allows you to connect to Rserve from R itself. It is very simple and limited, because Rserve was not primarily meant for R-to-R communication (there are better ways to do that), but it is useful for quick interactive connection to an Rserve farm. Which are those better ways to do it? I am thinking about using Rserve to have an R process send jobs to a bunch of Rserves in different machines. It is like what we could do with Rmpi (or pvm), but without the MPI layer. Therefore, presumably it'd be easier to deal with network problems, machine's failures, using checkpoints, etc. (i.e., to try to get better fault tolerance). It seems that Rserve would provide the basic infrastructure for doing that and saves me from reinventing the wheel of using sockets, etc, directly from R. However, Simon's comment about better ways of R-to-R communication made me wonder if this idea really makes sense. What is the catch? Have other people tried similar approaches? I was commenting on direct R-to-R communication using sockets + 'serialize' in R or the 'snow' package for parallel processing. The latter could be useful for what you have in mind, because it includes a socket-based implementation which allows you to spawn multiple children (across multiple machines) and collect their results. It uses regular rsh or ssh to start the jobs, so if can use that, it should work for you. 'snow' also has PVM and MPI implementations, the PVM one is really easy to setup (on unix) and that was what I was using for parallel computing in R on a cluster. I think I now understand your comments. I've used snow and Rmpi quite a bit. But the problem with Rmpi (or, rather, MPI) is the lack of fault tolerance: if a node goes down, the whole MPI universe breaks, and thus the complete set of slaves. Setting up some kind of fault-tolerant scheme with Rserve seemed possible/simpler (as it does not depend on the MPI layer). (Yes, maybe I should check snowFT, but it uses PVM, and I recall a while back there was a reason why we decided to go with MPI instead of PVM). Rserve is sort of comparable, but in addition it provides the spawning infrastructure due to its client/server concept. What it doesn't have is the convenience functions that snow provides like clusterApply etc. Thinking of it, it would be actually possible to add them, although I admit that the original goal of Rserve was not parallel computing :). The idea was to have one Rserve server and multiple clients Aha. I should have seen that. I think I understand the differences better now. whereas in 'snow' you sort of have one client and multiple servers. You could spawn multiple Rserves on multiple machines, but Rserve itself doesn't provide any load-balancing out of the box, so you'd have to do that yourself. Yes, sure. I think that should be doable, though, if I decide to try to go down this route. I don't know if that helps... :) It does help! Thanks a lot. Best, R. Cheers, Simon -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rserve and R to R communication
On 4/9/07, Paul Gilbert [EMAIL PROTECTED] wrote: Matthew Keller wrote: Hi Ramon, I've been interested in responses to your question. I have what I think is a similar issue - I have a very large simulation script and would like to be able to modularize it by having a main script that calls lots of subscripts - For simulations you need to worry about the random number generator sequence. I think snow has a scheme for handling this. If you devise your own system then be sure to look after this (non-trivial) detail. Aaaargh, you are right, of course. Rmpi does have it too. I'll recheck the rlecuyer and rsprng packages (both can be used from Rmpi and, IIRC, from snow). Thanks for pointing this out! Best, R. Paul Gilbert but I haven't done that yet because the only way I could think to do it was to call a subscript, have it run, save the objects from the subscript, and then call those objects back into the main script, which seems like a very slow and onerous way to do it. Would Rserve do what I'm looking for? On 4/7/07, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote: Dear All, The clients.txt file of the latest Rserve package, by Simon Urbanek, says, regarding its R client, (...) a simple R client, i.e. it allows you to connect to Rserve from R itself. It is very simple and limited, because Rserve was not primarily meant for R-to-R communication (there are better ways to do that), but it is useful for quick interactive connection to an Rserve farm. Which are those better ways to do it? I am thinking about using Rserve to have an R process send jobs to a bunch of Rserves in different machines. It is like what we could do with Rmpi (or pvm), but without the MPI layer. Therefore, presumably it'd be easier to deal with network problems, machine's failures, using checkpoints, etc. (i.e., to try to get better fault tolerance). It seems that Rserve would provide the basic infrastructure for doing that and saves me from reinventing the wheel of using sockets, etc, directly from R. However, Simon's comment about better ways of R-to-R communication made me wonder if this idea really makes sense. What is the catch? Have other people tried similar approaches? Thanks, R. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential inform...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rserve and R to R communication
Dear Matthew, On 4/9/07, Matthew Keller [EMAIL PROTECTED] wrote: Hi Ramon, I've been interested in responses to your question. I have what I think is a similar issue - I have a very large simulation script and would like to be able to modularize it by having a main script that calls lots of subscripts - but I haven't done that yet because the only way I could think to do it was to call a subscript, have it run, save the objects from the subscript, and then call those objects back into the main script, which seems like a very slow and onerous way to do it. Would Rserve do what I'm looking for? Maybe. That is in fact what I am wondering. However, an easier route might be to try Rmpi with papply. Or snow (with either Rmpi or rpvm). Or nws (a Linda implementation for R). Using Rmpi with papply, in particular, is a piece of cake with embarrasingly parallel problems. papply is like lapply, but parallelized, with built-in load-balancing, although it will run sequentially when no MPI universe is available; the later is very handy for debugging. snow also has parallelized, load-balanced, versions of apply (though I do not think it automatically switches to running sequentially). All of these (Rmpi, papply, Rmpi, rpvm, nws) are R packages available from CRAN. You will need some additional stuff (LAM/MPI for Rmpi ---or mpich if you run windows---, PVM for rpvm, and Python and twisted for nws). (I asked about Rserve because the lack of fault tolerance of MPI is a pain to deal with in my applications. Also, with LAM/MPI there are limits on the number of slaves that can be handled by a lam daemon, and that is a problem for some of our web-based applications. Thus, I am looking at alternative approaches that might eliminate some of the extra layers that MPI ---or PVM--- add. ). HTH, R. On 4/7/07, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote: Dear All, The clients.txt file of the latest Rserve package, by Simon Urbanek, says, regarding its R client, (...) a simple R client, i.e. it allows you to connect to Rserve from R itself. It is very simple and limited, because Rserve was not primarily meant for R-to-R communication (there are better ways to do that), but it is useful for quick interactive connection to an Rserve farm. Which are those better ways to do it? I am thinking about using Rserve to have an R process send jobs to a bunch of Rserves in different machines. It is like what we could do with Rmpi (or pvm), but without the MPI layer. Therefore, presumably it'd be easier to deal with network problems, machine's failures, using checkpoints, etc. (i.e., to try to get better fault tolerance). It seems that Rserve would provide the basic infrastructure for doing that and saves me from reinventing the wheel of using sockets, etc, directly from R. However, Simon's comment about better ways of R-to-R communication made me wonder if this idea really makes sense. What is the catch? Have other people tried similar approaches? Thanks, R. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rserve and R to R communication
On 4/9/07, Gregory Warnes [EMAIL PROTECTED] wrote: You may find it easier to use NetWorkSpaces for R (see http://nws- r.sourceforge.net/), which provides a simple mechanism for sending tasks to worker R processes and collect the results back when done. -G Thanks, Greg. Yes, I am actually playing around with nws too. Best, R. On Apr 9, 2007, at 12:08PM , Matthew Keller wrote: Hi Ramon, I've been interested in responses to your question. I have what I think is a similar issue - I have a very large simulation script and would like to be able to modularize it by having a main script that calls lots of subscripts - but I haven't done that yet because the only way I could think to do it was to call a subscript, have it run, save the objects from the subscript, and then call those objects back into the main script, which seems like a very slow and onerous way to do it. Would Rserve do what I'm looking for? On 4/7/07, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote: Dear All, The clients.txt file of the latest Rserve package, by Simon Urbanek, says, regarding its R client, (...) a simple R client, i.e. it allows you to connect to Rserve from R itself. It is very simple and limited, because Rserve was not primarily meant for R-to-R communication (there are better ways to do that), but it is useful for quick interactive connection to an Rserve farm. Which are those better ways to do it? I am thinking about using Rserve to have an R process send jobs to a bunch of Rserves in different machines. It is like what we could do with Rmpi (or pvm), but without the MPI layer. Therefore, presumably it'd be easier to deal with network problems, machine's failures, using checkpoints, etc. (i.e., to try to get better fault tolerance). It seems that Rserve would provide the basic infrastructure for doing that and saves me from reinventing the wheel of using sockets, etc, directly from R. However, Simon's comment about better ways of R-to-R communication made me wonder if this idea really makes sense. What is the catch? Have other people tried similar approaches? Thanks, R. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rserve and R to R communication
Dear All, The clients.txt file of the latest Rserve package, by Simon Urbanek, says, regarding its R client, (...) a simple R client, i.e. it allows you to connect to Rserve from R itself. It is very simple and limited, because Rserve was not primarily meant for R-to-R communication (there are better ways to do that), but it is useful for quick interactive connection to an Rserve farm. Which are those better ways to do it? I am thinking about using Rserve to have an R process send jobs to a bunch of Rserves in different machines. It is like what we could do with Rmpi (or pvm), but without the MPI layer. Therefore, presumably it'd be easier to deal with network problems, machine's failures, using checkpoints, etc. (i.e., to try to get better fault tolerance). It seems that Rserve would provide the basic infrastructure for doing that and saves me from reinventing the wheel of using sockets, etc, directly from R. However, Simon's comment about better ways of R-to-R communication made me wonder if this idea really makes sense. What is the catch? Have other people tried similar approaches? Thanks, R. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reasons to Use R
Dear Lorenzo, I'll try not to repeat what other have answered before. On 4/5/07, Lorenzo Isella [EMAIL PROTECTED] wrote: The institute I work for is organizing an internal workshop for High Performance Computing (HPC). (...) (1)Institutions (not only academia) using R You can count my institution too. Several groups. (I can provide more details off-list if you want). (2)Hardware requirements, possibly benchmarks (3)R clusters, R multiple CPU machines, R performance on different hardware. We do use R in commodity off-the shelf clusters; our two clusters are running Debian GNU/Linux; both 32-bit machines ---Xeons--- and 64-bit machines ---dual-core AMD Opterons. We use parallelization quite a bit, with MPI (via Rmpi and papply packages mainly). One convenient feature is that (once the lam universe is up and running) whether we are using the 4 cores in a single box, or the max available 120, is completeley transparent. Using R and MPI is, really, a piece of cake. That said, there are things that I miss; in particular, oftentimes I wish R were Erlang or Oz because of the straightforward fault-tolerant distributed computing and the built-in abstractions for distribution and concurrency. The issue of multithreading has come up several times in this list and is something that some people miss. I am not sure how much R is used in the usual HPC realms. It is my understanding that the traditional HPC is still dominated by things such as HPF, and C with MPI, OpenMP, or UPC or Cilk. The usual answer to but R is too slow is but you can write Fortran or C code for the bottlenecks and call it from R. I guess you could use, say, UPC in that C that is linked to R, but I have no experience. And I think this code can become a pain to write and maintain (specially if you want to play around with what you try to parallelize, etc). My feeling (based on no information or documentation whatsoever) is that how far R can be stretched or extended into HPC is still an open question. (4)finally, a list of the advantages for using R over commercial statistical packages. The money-saving in itself is not a reason good enough and some people are scared by the lack of professional support, though this mailing list is simply wonderful. (In addition to all the already mentioned answers) Complete source code availability. Being able to look at the C source code for a few things has been invaluable for me. And, of course, and extremely active, responsive, and vibrant community that, among other things, has contributed packages and code for an incredible range of problems. Best, R. P.S. I'd be interested in hearing about the responses you get to your presentation. Kind Regards Lorenzo Isella __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] If you had just one book on R to buy...
On 2/25/07, Julien Barnier [EMAIL PROTECTED] wrote: Hi, I am starting a new job as a study analyst for a social science research unit. I would really like to use R as my main tool for data manipulation and analysis. So I'd like to ask you, if you had just one book on R to buy (or to keep), which one would it be ? I already bought the Handbook of Statistical Analysis Using R, but I'd like to have something more complete, both on the statistical point of view and on R usage. I thought that Modern applied statistics with S-Plus would be a good choice, but maybe some of you could have interesting suggestions ? Dear Julien, I'd definitely go for MASS if you already have Handbook. MASS is an awesome book, but you did not tell us anything about your background (stats begginners, for instance, sometimes get lost in MASS, because that is not the target audience). In terms of books of this level, MASS is unique. (There are more specific books for certain topics, such as mixed models, etc; but for a wide coverage, I'd go with MASS). HTH, R. Thanks in advance, -- Julien __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R book advice
Dear Paul, You might want to add Everitt Hothorn's A Handbook of Statistical Analyses Using R. If I had to recommend just one book it'd be this one. My own (i.e., highly subjective) suggestion, if you can afford two books, would be to first go through Dalgaard's and then through Everitt Hothorn's. I do not have a direct experience with Verzani's, but I've heard great things about it. I think a pdf of a preliminary version is available from the R page. Regarding Crawley's ... well, I find some/many of his comments and suggestions unorthodox (my experience is with his Statistical Computing: An Introduction to Data Analysis using S-Plus, a book I would not recommend to a novice). HTH, R. On 2/16/07, Paul Lynch [EMAIL PROTECTED] wrote: I'm looking for a book for someone completely ignorant of statistics who wishes to learn both statistics and R. I've found three possibilities, one by Verzani (Using R for Introductory Statistics), one by Crawley (Statistics: An Introduction using R), and one by Dalgaard (Introductory Statistics with R). Do these books have different emphases, perspectives, or strengths? Should I just pick one at random and buy it? Thanks, --Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Snow vs Rmpi
Dear Vadim, On 2/14/07, Vadim Ogranovich [EMAIL PROTECTED] wrote: Hi, I have few high-level questions about the Snow and Rmpi packages . I understand that Snow uses Rmpi as one of possible transport layers, yet my questions about user experience, not technical details: 1. Does Snow install and work well in Windows? 2. Interruptibility. I understand that currently it is impossible to interrupt a running top-level command in Snow ( Ctl-c or the likes), the only way to kill slave processes is to kill the master R process. Is this accurate? What about Rmpi ? Is there any difference between Windows and Linux? I've never used any of those under Windoze. I think your statement is accurate under Linux. (In fact, I often get rid of any of those Rmpis gone astray by issuing a lamhalt and/or lamwipe). 3. When the master process dies , is it guaranteed that the slaves will die too? How reliable is this (I've seen some applications, not related to R, that were flaky about killing slaves) If you use an orderly exit procedure (mpi.close.Rslaves(); mpi.quit()) I've never, ever, seen badly behaved Rmpi slaves. But I've seen them under strange circumstances (I think network problems that messed up the lam universe ?). A kind of fail proof approach, if you can afford it, is to use different lam universes (using the LAM_MPI_SESSION_SUFFIX) for different simultaneous runs. Then, if one particular run behaves poorly, you can issue a lamhalt/lamwipe for just that LAM universe. A final suggestion: you might want to take a look at the papply package, which does load-balancing and allows you to run sequential (if there is no lam universe), and thus makes debugging much simpler. R. Thank you very much for your help, Vadim [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Snow Package and R: Exported Variable Problem
Dear Robert, On 2/2/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hello and thanks in advance for your time. I've created a simulation on my cluster which uses a custom package developed by me for different functions and also the snow package. Right now I'm using LAM to communicate between nodes and am currently only testing my code on 3 nodes for simplicity, though I plan on expanding to 16 later. My problem is this error: Error in fn(par, ..) : object \x1\ not found \n attr(,class) try-error In my simulation I need to run a function several times with an different variable each time. All the invocations on the functions are independent of the others. I start the simulation on one node, create a cluster of several nodes, load my custom package and snow on all of them, use clusterExport(cl, x1) to export the variable x1(among other variable I need), then I call my simulation on the cluster using clusterApplyLB(cl, 2:S, simClust) where cl is the cluster and S is a constant defined above as 500. Using print statements (since snow, or R for that matter, has next to no ability to debug) I found that the error cropped up in this statement: theta6 = optim(c(0,0,0,0,0,0,.2), loglikelihood, scrore6, method = CG, control=list(fnscale=-1,reltol=1e-8,maxit=2000))$par Both the functions loglikelelihood and score6 use x1, but I know that it is getting exported to the node correctly since it gets assigned earlier in the simulation: x1 = rep(0,n1) The error I stated above happens fo every itteration of the simulation (499 times) and I'm really at a loss as to why its happening and what I can do to determine what it is. I'm wondering at this point if exporting the variable makes it unavailable to certain other packages, though that doesn't really make any sense. From reading quickly through your description, I do not see anything obviously wrong. If anyone can help me with this problem, or let me know how I can debug this, or even a clue as to why it might be happening I would greatly appreciate it. I've been wrestling with this for some time and no online documentation can help. Thank you for your time and help. When I was feeling really lost, I've resorted to assigning intermediate output from commands such as ls, search, etc, to variables (i.e., something like this.ls - ls() from inside you function call, e.g., simClust) and then, e.g., from mpi.remote.exec, looking at the value of those variables. And, for over a year now, I've been doing most of my MPI stuff with papply; the one nice thing of papply is that, if you have no LAM/MPI universe, it will use a serial (not a parallel) version, so it is much, much, much easier to debug, because you see the warnings and the errors. So most of the frustration of things like launching something and seeing it never return, etc, is gone. Best, R. Just so you know I'm a Computer Scientist not a Statistician, though I will be able to give any information about the statistics involved in this program. I am reluctant to give away all source code since it is not my work but rather code I'm converting from standard code to parallelized code for a professor of mine. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Does R support grid or parallel computing?
Dear Xiaopeng, There is certainly support for, among others, MPI and PVM; check packages Rmpi, rpvm, snow, and papply, in CRAN. Best, R. On 1/29/07, xiaopeng hu [EMAIL PROTECTED] wrote: Does R support grid or parallel computing? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package for phylogenetic tree analyses
Dear Lalitha, On 1/26/07, lalitha viswanath [EMAIL PROTECTED] wrote: Hi I am looking for a package that 1. reads in a phylogenetic tree in NEXUS format 2. given two members/nodes on the tree, can return the distance between the two using the tree. I came across the following packages on CRAN ouch, ape, apTreeShape, phylgr all of which seem to provide extensive range of functions for reading in a Nexus-format tree and performing phylogenetic analyses, tree comparisons etc, but none to the best of my undestanding seem to provide a function obtain distances (in terms of branch lengths) between two nodes on a single tree. I am working with just one tree and need a function to return distances between various pairs of nodes on the tree. Is there any other package out there that has this functionality? I've been away from that area for some years now, but certainly our phylogr package will not do what you want. However, I think there are various external (non R) programs that will do it, and that might be all you need if this is just a sporadic use. The set of programs distributed and maintained by Ted Garland (PDAP) did provide the type of output you want (in the form of a matrix of distances). I am sure there are others out there (I bet PHYLIP does it too). HTH, R. Thanks for your responses to my earlier queries. As a beginning R programmer, your responses have been of utmost help and guidance. Lalitha Access over 1 million songs. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ECB/Sidebar/R (Emacs) was: Re: kate editor for R
On 1/22/07, Dirk Eddelbuettel [EMAIL PROTECTED] wrote: On 22 January 2007 at 00:05, Ramon Diaz-Uriarte wrote: | On 1/20/07, Dirk Eddelbuettel [EMAIL PROTECTED] wrote: | Just confirms my suspicion that even after all these years, I barely | scratched the surface of ess. That '2+ years' old feature wouldn't happen to | be documented somewhere, would it? | | Dirk, I must be missing something. All I do is: M-x ecb-activate | Everything works. I do nothing special with ess. For that matter, I do | nothing special when editing LaTeX or Python, and ecb (et al) do work | as intended. I had looked at ECB for C++ programming. It simply hadn't occurred to me that it would plug into ESS. I wasn't aware of it either until I attended Tony Rossini's tutorial at useR! 2006. Score another one for Emacs as an operating system. Oh yes, and almost coffee maker and pizza deliverer :-). R. Dirk -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kate editor for R
On 1/20/07, Marc Schwartz [EMAIL PROTECTED] wrote: On Sat, 2007-01-20 at 11:20 +0100, Ramon Diaz-Uriarte wrote: On 1/20/07, Marc Schwartz [EMAIL PROTECTED] wrote: xft anti-aliasing is incorporated into the version 23 unicode trunk. So it looks great on a hi-res LCD panel. Without xft, even using Bitstream fonts, it was still pretty rough on the eyes. Humm, call me silly, but most of the time I do not like anti-aliased fonts: I tend to agree with http://modeemi.cs.tut.fi/~tuomov/ion/faq/entries/Blurred_fonts.html, where he says characters look like having been dragged through mud :-). It also fully supports GTK widgets, which is great if you are using GNOME, which I do. But my .emacs gets rid of the toolbar and scroll bars on start-up (I find toolbars confusing things that take up precious screen space), and often work without the menubar (when I am doing familiar work). I use ion3 (http://modeemi.cs.tut.fi/~tuomov/ion/), which, together with wmii (and followed, at some distance, by fmwv), I find the most usable window managers, and thus the look of widgets is not that relevant to me. So, for most practical purposes (except for resizing with the mouse) I use emacs as if started with the -w command. (I know, I know, this looks like going backwards ... must be a mid-life involution crisis :-). We'll drag you kicking and screaming into the 21st century... ;-) I was afraid someone would suggest that sooner or later :-). xft was added as a patch to version 22, but it was not very stable. Note that version 23 is in alpha status, so use at your own risk if you decide to pursue this. 21 is still the current stable release version, but 23 has been rock solid for me. I can provide you with a shell script to build it. Let me know. Let me try with the debian packages, and if I have problems, I'll definitely start bugging you. Thanks a lot for your help! Best, R. FWIW, here are some screen shots so that you can get a feel for what it looks like. This is using two 1600x1200 lcd panels. 1. Basic view of main window, showing ECB and ESS: http://home.comcast.net/~marc_schwartz/emacs23.png 2. Full screen (3200x1600 using nVidia TwinView) capture to show GTK file selection widget: http://home.comcast.net/~marc_schwartz/emacs23-2.png 3. View of main window to show the integration of SVN version control, which I use my all of my R code: http://home.comcast.net/~marc_schwartz/emacs23-3.png Hey, those look very neat! (I'd get rid of all those toolbars :-). But very neat. Time to try it. (But no way I am giving up ion3). No doubt that the use of xft is a personal choice, and some folks do not like it, perhaps notably on CRTs. As I have gotten older and need bi-focals for computer work and reading, I find the use of xft much easier and I am less prone to eye strain, given how many hours I typically spend working each day. Marc, but the solution for that problem are not xft fonts. The solution is ... working less hours. (I'll blackmail my boss: if you force me to work more hours, I'll use xft fonts. I bet it'll be a great strategy). Best HTH, Marc -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ECB/Sidebar/R (Emacs) was: Re: kate editor for R
On 1/20/07, Dirk Eddelbuettel [EMAIL PROTECTED] wrote: Hi Tony, On 20 January 2007 at 15:20, AJ Rossini wrote: | On Friday 19 January 2007 15:39, Dirk wrote: | As I am doing more C++ work, I glanced at oo-browser, sidebar, ecb (all in | Debian/Ubuntu). Would a real Emacs hacker be able to these to R code too? | That functionality (though relatively minimal, i.e. ECB/sidebar support | through imenu) should have existed for 2+ years now, at least it does for me. Just confirms my suspicion that even after all these years, I barely scratched the surface of ess. That '2+ years' old feature wouldn't happen to be documented somewhere, would it? Dirk, I must be missing something. All I do is: M-x ecb-activate Everything works. I do nothing special with ess. For that matter, I do nothing special when editing LaTeX or Python, and ecb (et al) do work as intended. Best, R. Dirk -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Offtopic: emacs 23, was kate editor for R
On 1/20/07, Peter Dalgaard [EMAIL PROTECTED] wrote: Ramon Diaz-Uriarte wrote: Hi Marc, Thanks a lot for the detailed explanation! I'll give it a try. (But still, why emacs23? what is missing in v. 21 that you get in 23?). Best, R. Ability to load files with UTF-8 characters in the name? (This is pretty maddening if you find yourself with such a beast.) Aha, thanks. I try to stay away from those creatures. I guess I'll be able to start adding a nice, good looking spanish ñ to file names :-). R. BTW, any inkling when/whether this is heading for Fedora N? -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kate editor for R
On 1/20/07, Marc Schwartz [EMAIL PROTECTED] wrote: xft anti-aliasing is incorporated into the version 23 unicode trunk. So it looks great on a hi-res LCD panel. Without xft, even using Bitstream fonts, it was still pretty rough on the eyes. Humm, call me silly, but most of the time I do not like anti-aliased fonts: I tend to agree with http://modeemi.cs.tut.fi/~tuomov/ion/faq/entries/Blurred_fonts.html, where he says characters look like having been dragged through mud :-). It also fully supports GTK widgets, which is great if you are using GNOME, which I do. But my .emacs gets rid of the toolbar and scroll bars on start-up (I find toolbars confusing things that take up precious screen space), and often work without the menubar (when I am doing familiar work). I use ion3 (http://modeemi.cs.tut.fi/~tuomov/ion/), which, together with wmii (and followed, at some distance, by fmwv), I find the most usable window managers, and thus the look of widgets is not that relevant to me. So, for most practical purposes (except for resizing with the mouse) I use emacs as if started with the -w command. (I know, I know, this looks like going backwards ... must be a mid-life involution crisis :-). xft was added as a patch to version 22, but it was not very stable. Note that version 23 is in alpha status, so use at your own risk if you decide to pursue this. 21 is still the current stable release version, but 23 has been rock solid for me. I can provide you with a shell script to build it. Let me know. Let me try with the debian packages, and if I have problems, I'll definitely start bugging you. Thanks a lot for your help! Best, R. Best regards, Marc On Sat, 2007-01-20 at 03:59 +0100, Ramon Diaz-Uriarte wrote: Hi Marc, Thanks a lot for the detailed explanation! I'll give it a try. (But still, why emacs23? what is missing in v. 21 that you get in 23?). Best, R. On 1/19/07, Marc Schwartz [EMAIL PROTECTED] wrote: On Fri, 2007-01-19 at 16:09 +0100, Ramon Diaz-Uriarte wrote: snip I had problems with one of the packages ecb depends upon (semantic ?), and emacs-snapshot. IIRC it was a documented problem related to a bug in semantic (?); maybe it's been fixed now. But what does emacs-snapshot-gtk provide you now (besides the pretinness) that you'd miss with 21-4? snip Ramon, Just a quick heads up on the ECB issue. I am using Emacs 23 from CVS and had to update ECB and the associated packages to use this version of Emacs. I have emacs 23 installed and run from a separate download folder, so that I do not overwrite the installed stable version. I use the CEDET cedet-1.0pre3.tar.gz aggregate package from http://cedet.sourceforge.net/ as well as the ECB cvs snap shot package ecb.tar.gz from http://ecb.sourceforge.net/downloads.html. The CEDET package includes cogre, ede, eieio, semantic and speedbar. Extract these two files and then modify ~/.emacs with the following: ;; Load ECB (setq semantic-load-turn-everything-on t) (load-file /PATH/TO/CEDET/cedet-1.0pre3/common/cedet.el) (add-to-list 'load-path /PATH/TO/ECB/ecb-snap) (require 'ecb) And all seems well. HTH, Marc Schwartz -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kate editor for R
On Friday 19 January 2007 03:30, Frank E Harrell Jr wrote: Like kile for LaTeX, Linux/KDE's kate editor is an excellent editor for R, with easy code submission to a running R process. Syntax highlighting is good. I have not been able to figure out two things: - how to automatically reformat a line or region of text using good indentation rules (Emacs/ESS make this so easy by just hitting Tab while the cursor is in a line, or highlighting a region and hitting Esq q) - how to cause auto-indenting as you type braces. For me, kate puts a { in column one Thanks for any pointers. Dear Frank, May I ask why you are moving to Kate from Emacs? I tried Kate with R (and Python and LaTeX) and I really liked the folding (which seems a lot better than all the not-really-functional hacks for getting folding with R and Python code) and some of the function/class browsers. However, I specially missed: a) the possibility of opening as many R processes as I want, and placing that buffer in wherever place and with whichever size I want. b) most of the rest of emacs, actually (hey, where did my shells go? and my org-mode buffer? and my ...; not to talk about the keybindings). If you feel like it, I'd like to hear about your impressions. R. Frank -- Ramón Díaz-Uriarte Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kate editor for R
On Friday 19 January 2007 14:12, Frank E Harrell Jr wrote: Ramon Diaz-Uriarte wrote: On Friday 19 January 2007 03:30, Frank E Harrell Jr wrote: Like kile for LaTeX, Linux/KDE's kate editor is an excellent editor for R, with easy code submission to a running R process. Syntax highlighting is good. I have not been able to figure out two things: - how to automatically reformat a line or region of text using good indentation rules (Emacs/ESS make this so easy by just hitting Tab while the cursor is in a line, or highlighting a region and hitting Esq q) - how to cause auto-indenting as you type braces. For me, kate puts a { in column one Thanks for any pointers. Dear Frank, May I ask why you are moving to Kate from Emacs? I tried Kate with R (and Python and LaTeX) and I really liked the folding (which seems a lot better than all the not-really-functional hacks for getting folding with R and Python code) and some of the function/class browsers. However, I specially missed: a) the possibility of opening as many R processes as I want, and placing that buffer in wherever place and with whichever size I want. b) most of the rest of emacs, actually (hey, where did my shells go? and my org-mode buffer? and my ...; not to talk about the keybindings). If you feel like it, I'd like to hear about your impressions. R. Thanks for your reply, Frank. Good question Ramon. We have dozens of R users in our department and many of them were not brought up on Emacs and find it hard to learn. We are looking for an alternative to recommend for them. I love Emacs myself and find that it is the fastest editor by a significant margin, and I am used to its keybindings. But I prefer kate for printing and for managing multiple files in a project. kate has a nice sidebar for navigating the files, and indicates which files have been changed since they were saved. Ouch, I had missed that. kate also schematically depicts nested code with side symbols connected by vertical lines for {}. Yes, this feature I _really_ like. Nothing like it that I know of for emacs (I use fold-dwim, but I find it clunky). Scrolling of the R output window is a little more logical in kate than in ESS. I find myself having to type Esc-shift- often in ESS/Emacs to get to the bottom of the R output but kate puts the cursor at the bottom. Also I get a little frustrated with package management in Xemacs (I know however that it's nice to be able to load thousands of packages) related to file permissions, ftp commands, anonymous logins, etc. And from a purely looks standpoint kate is superior. I tried jedit for a bit. jedit has a lot of nice features but also has problems with indenting in R. Thanks for your feedback. I think I'll play again with kate this weekend. Best, R. Frank Frank -- Ramón Díaz-Uriarte Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kate editor for R
Hi Dirk, On Friday 19 January 2007 15:39, Dirk Eddelbuettel wrote: Ramon, Frank, Great discussion. Nothing like an editor feud over morning coffee. Just kidding. Not at the editor flame war stage yet (nobody mentioned vim :-). On 19 January 2007 at 11:18, Ramon Diaz-Uriarte wrote: | However, I specially missed: | | a) the possibility of opening as many R processes as I want, and placing | that buffer in wherever place and with whichever size I want. | | b) most of the rest of emacs, actually (hey, where did my shells go? and | my org-mode buffer? and my ...; not to talk about the keybindings). [ Thanks for the org-mode suggestion. That looks very useful. How do I get it to sync to my Palm, though? ;-) ] I asked the same at the org-mode list some time back and there was a short thread (http://lists.gnu.org/archive/html/emacs-orgmode/2006-11/msg3.html). The bottom line is this: a) for the general org files, you send them to the palm as text, and you edit them there with a suitable editor (e.g., PalmED). If org-mode files are kept under version control, life becomes easier. b) dealing with calendar is a more serious problem. c) there seems to be some (not a lot of) interest in these issues, but things are not smooth yet. (I am using my Palm a lot less now, so I am no longer even doing a) regularly). On 19 January 2007 at 07:12, Frank E Harrell Jr wrote: [...] | and I am used to its keybindings. But I prefer kate for printing and | for managing multiple files in a project. kate has a nice sidebar for | navigating the files, and indicates which files have been changed since As I am doing more C++ work, I glanced at oo-browser, sidebar, ecb (all in Debian/Ubuntu). Would a real Emacs hacker be able to these to R code too? I use ecb with R directly out of the ecb box. No problem. | they were saved. kate also schematically depicts nested code with side | symbols connected by vertical lines for {}. Scrolling of the R output | window is a little more logical in kate than in ESS. I find myself | having to type Esc-shift- often in ESS/Emacs to get to the bottom of | the R output but kate puts the cursor at the bottom. Also I get a | little frustrated with package management in Xemacs (I know however that | it's nice to be able to load thousands of packages) related to file | permissions, ftp commands, anonymous logins, etc. And from a purely | looks standpoint kate is superior. I switched back to GNU Emacs, using the emacs-snapshot-gtk package in Debian and Ubuntu. Prettier, and still emacs :) I get by without locally install elisp code in /usr/local -- everything I needed was apt-get'able. I had problems with one of the packages ecb depends upon (semantic ?), and emacs-snapshot. IIRC it was a documented problem related to a bug in semantic (?); maybe it's been fixed now. But what does emacs-snapshot-gtk provide you now (besides the pretinness) that you'd miss with 21-4? R. Dirk -- Ramón Díaz-Uriarte Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kate editor for R
Hi Marc, Thanks a lot for the detailed explanation! I'll give it a try. (But still, why emacs23? what is missing in v. 21 that you get in 23?). Best, R. On 1/19/07, Marc Schwartz [EMAIL PROTECTED] wrote: On Fri, 2007-01-19 at 16:09 +0100, Ramon Diaz-Uriarte wrote: snip I had problems with one of the packages ecb depends upon (semantic ?), and emacs-snapshot. IIRC it was a documented problem related to a bug in semantic (?); maybe it's been fixed now. But what does emacs-snapshot-gtk provide you now (besides the pretinness) that you'd miss with 21-4? snip Ramon, Just a quick heads up on the ECB issue. I am using Emacs 23 from CVS and had to update ECB and the associated packages to use this version of Emacs. I have emacs 23 installed and run from a separate download folder, so that I do not overwrite the installed stable version. I use the CEDET cedet-1.0pre3.tar.gz aggregate package from http://cedet.sourceforge.net/ as well as the ECB cvs snap shot package ecb.tar.gz from http://ecb.sourceforge.net/downloads.html. The CEDET package includes cogre, ede, eieio, semantic and speedbar. Extract these two files and then modify ~/.emacs with the following: ;; Load ECB (setq semantic-load-turn-everything-on t) (load-file /PATH/TO/CEDET/cedet-1.0pre3/common/cedet.el) (add-to-list 'load-path /PATH/TO/ECB/ecb-snap) (require 'ecb) And all seems well. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eval(parse(text vs. get when accessing a function
(I overlooked the reply). Thanks, Gabor. That is neat and easy! (and I should have been able to see it on my own :-( Best, R. On 1/8/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: The S4 is not essential. You could do it in S3 too: f.a - function(x) x+1 f.b - function(x) x+2 f - function(x) UseMethod(f) f(structure(10, class = a)) [1] 11 attr(,class) [1] a On 1/6/07, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote: Hi Martin, On 1/6/07, Martin Morgan [EMAIL PROTECTED] wrote: Hi Ramon, It seems like a naming convention (f.xxx) and eval(parse(...)) are standing in for objects (of class 'GeneSelector', say, representing a function with a particular form and doing a particular operation) and dispatch (a function 'geneConverter' might handle a converter of class 'GeneSelector' one way, user supplied ad-hoc functions more carefully; inside geneConverter the only real concern is that the converter argument is in fact a callable function). eval(parse(...)) brings scoping rules to the fore as an explicit programming concern; here scope is implicit, but that's probably better -- R will get its own rules right. Martin Here's an S4 sketch: setClass(GeneSelector, contains=function, representation=representation(description=character), validity=function(object) { msg - NULL argNames - names(formals(object)) if (argNames[1]!=x) msg - c(msg, \n GeneSelector requires a first argument named 'x') if (!... %in% argNames) msg - c(msg, \n GeneSelector requires '...' in its signature) if (0==length([EMAIL PROTECTED])) msg - c(msg, \n Please describe your GeneSelector) if (is.null(msg)) TRUE else msg }) setGeneric(geneConverter, function(converter, x, ...) standardGeneric(geneConverter), signature=c(converter)) setMethod(geneConverter, signature(converter=GeneSelector), function(converter, x, ...) { ## important stuff here converter(x, ...) }) setMethod(geneConverter, signature(converter=function), function(converter, x, ...) { message(ad-hoc converter; hope it works!) converter(x, ...) }) and then... c1 - new(GeneSelector, + function(x, ...) prod(x, ...), + description=Product of x) c2 - new(GeneSelector, + function(x, ...) sum(x, ...), + description=Sum of x) geneConverter(c1, 1:4) [1] 24 geneConverter(c2, 1:4) [1] 10 geneConverter(mean, 1:4) ad-hoc converter; hope it works! [1] 2.5 cvterr - new(GeneSelector, function(y) {}) Error in validObject(.Object) : invalid class GeneSelector object: 1: GeneSelector requires a first argument named 'x' invalid class GeneSelector object: 2: GeneSelector requires '...' in its signature invalid class GeneSelector object: 3: Please describe your GeneSelector xxx - 10 geneConverter(xxx, 1:4) Error in function (classes, fdef, mtable) : unable to find an inherited method for function geneConverter, for signature numeric Thanks!! That is actually a rather interesting alternative approach and I can see it also adds a lot of structure to the problem. I have to confess, though, that I am not a fan of OOP (nor of S4 classes); in this case, in particular, it seems there is a lot of scaffolding in the code above (the counterpoint to the structure?) and, regarding scoping rules, I prefer to think about them explicitly (I find it much simpler than inheritance). Best, R. Ramon Diaz-Uriarte [EMAIL PROTECTED] writes: Dear Greg, On 1/5/07, Greg Snow [EMAIL PROTECTED] wrote: Ramon, I prefer to use the list method for this type of thing, here are a couple of reasons why (maybe you are more organized than me and would never do some of the stupid things that I have, so these don't apply to you, but you can see that the general suggestion applys to some of the rest of us). Those suggestions do apply to me of course (no claim to being organized nor beyond idiocy here). And actually the suggestions on this thread are being very useful. I think, though, that I was not very clear on the context and my examples were too dumbed down. So I'll try to give more detail (nothing here is secret, I am just trying not to bore people). The code is part of a web-based application, so there is no interactive user. The R code is passed the arguments (and optional user functions) from the CGI. There is one core function (call it cvFunct
Re: [R] eval(parse(text vs. get when accessing a function
Hi Martin, On 1/6/07, Martin Morgan [EMAIL PROTECTED] wrote: Hi Ramon, It seems like a naming convention (f.xxx) and eval(parse(...)) are standing in for objects (of class 'GeneSelector', say, representing a function with a particular form and doing a particular operation) and dispatch (a function 'geneConverter' might handle a converter of class 'GeneSelector' one way, user supplied ad-hoc functions more carefully; inside geneConverter the only real concern is that the converter argument is in fact a callable function). eval(parse(...)) brings scoping rules to the fore as an explicit programming concern; here scope is implicit, but that's probably better -- R will get its own rules right. Martin Here's an S4 sketch: setClass(GeneSelector, contains=function, representation=representation(description=character), validity=function(object) { msg - NULL argNames - names(formals(object)) if (argNames[1]!=x) msg - c(msg, \n GeneSelector requires a first argument named 'x') if (!... %in% argNames) msg - c(msg, \n GeneSelector requires '...' in its signature) if (0==length([EMAIL PROTECTED])) msg - c(msg, \n Please describe your GeneSelector) if (is.null(msg)) TRUE else msg }) setGeneric(geneConverter, function(converter, x, ...) standardGeneric(geneConverter), signature=c(converter)) setMethod(geneConverter, signature(converter=GeneSelector), function(converter, x, ...) { ## important stuff here converter(x, ...) }) setMethod(geneConverter, signature(converter=function), function(converter, x, ...) { message(ad-hoc converter; hope it works!) converter(x, ...) }) and then... c1 - new(GeneSelector, + function(x, ...) prod(x, ...), + description=Product of x) c2 - new(GeneSelector, + function(x, ...) sum(x, ...), + description=Sum of x) geneConverter(c1, 1:4) [1] 24 geneConverter(c2, 1:4) [1] 10 geneConverter(mean, 1:4) ad-hoc converter; hope it works! [1] 2.5 cvterr - new(GeneSelector, function(y) {}) Error in validObject(.Object) : invalid class GeneSelector object: 1: GeneSelector requires a first argument named 'x' invalid class GeneSelector object: 2: GeneSelector requires '...' in its signature invalid class GeneSelector object: 3: Please describe your GeneSelector xxx - 10 geneConverter(xxx, 1:4) Error in function (classes, fdef, mtable) : unable to find an inherited method for function geneConverter, for signature numeric Thanks!! That is actually a rather interesting alternative approach and I can see it also adds a lot of structure to the problem. I have to confess, though, that I am not a fan of OOP (nor of S4 classes); in this case, in particular, it seems there is a lot of scaffolding in the code above (the counterpoint to the structure?) and, regarding scoping rules, I prefer to think about them explicitly (I find it much simpler than inheritance). Best, R. Ramon Diaz-Uriarte [EMAIL PROTECTED] writes: Dear Greg, On 1/5/07, Greg Snow [EMAIL PROTECTED] wrote: Ramon, I prefer to use the list method for this type of thing, here are a couple of reasons why (maybe you are more organized than me and would never do some of the stupid things that I have, so these don't apply to you, but you can see that the general suggestion applys to some of the rest of us). Those suggestions do apply to me of course (no claim to being organized nor beyond idiocy here). And actually the suggestions on this thread are being very useful. I think, though, that I was not very clear on the context and my examples were too dumbed down. So I'll try to give more detail (nothing here is secret, I am just trying not to bore people). The code is part of a web-based application, so there is no interactive user. The R code is passed the arguments (and optional user functions) from the CGI. There is one core function (call it cvFunct) that, among other things, does cross-validation. So this is one way to do things: cvFunct - function(whatever, genefiltertype, whateverelse) { internalGeneSelect - eval(parse(text = paste(geneSelect, genefiltertype, sep = .))) ## do things calling internalGeneSelect, } and now define all possible functions as geneSelect.Fratio - function(x, y, z) {##something} geneSelect.Wilcoxon - function(x, y, z) {## something else} If I want more geneSelect functions, adding them is simple. And I can even allow the user to pass her/his own functions, with the only restriction that it takes three args, x, y, z
Re: [R] eval(parse(text vs. get when accessing a function
On 1/5/07, jim holtman [EMAIL PROTECTED] wrote: The other reason for considering which of the different approaches to use would be performance: f.1 - function(x) x+1 f.2 - function(x) x+2 system.time({ + for (i in 1:10){ + eval(parse(text=paste('f.', i%%2+1, sep='')))(i) + } + }) [1] 6.96 0.00 8.32 NA NA system.time({ + for (i in 1:10){ + {if(i %% 2 == 0) f.1 else f.2}(i) + } + }) [1] 0.52 0.00 0.61 NA NA eval(parse...) seems to be an order of magnitude slower. It would make a difference if you were calling it several thousand times; so it depends on your application. Yes, that is true, thanks. Note, though, that in my case I am more likely to do the eval(parse( and pasting only once, and then call the new function thousands of times; something more like your second version than the first. g - function(x, fpost) { calledf - eval(parse(text = paste(f., fpost, sep = ))) calledf(x) ## the thousands of calls to calledf go here } R. On 1/5/07, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote: On Friday 05 January 2007 19:35, Bert Gunter wrote: ?? Or to add to what Peter Dalgaard said... (perhaps for the case of many more functions) Why eval(parse())? What's wrong with if then? g - function(fpost,x){if(fpost==1)f.1 else f.2 }(x) or switch() if you have more than 2 possible arguments? I think your remarks reinforce the wisdom of Thomas's axiom . Thanks, Bert, but as with Peter's solution, your solution forces me to build g ahead of time. And again, I am not sure I see why the attempt to avoid eval(parse(text. Best, R. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ramon Diaz-Uriarte Sent: Friday, January 05, 2007 10:02 AM To: r-help; [EMAIL PROTECTED] Subject: [R] eval(parse(text vs. get when accessing a function Dear All, I've read Thomas Lumley's fortune If the answer is parse() you should usually rethink the question.. But I am not sure it that also applies (and why) to other situations (Lumley's comment http://tolstoy.newcastle.edu.au/R/help/05/02/12204.html was in reply to accessing a list). Suppose I have similarly called functions, except for a postfix. E.g. f.1 - function(x) {x + 1} f.2 - function(x) {x + 2} And sometimes I want to call f.1 and some other times f.2 inside another function. I can either do: g - function(x, fpost) { calledf - eval(parse(text = paste(f., fpost, sep = ))) calledf(x) ## do more stuff } Or: h - function(x, fpost) { calledf - get(paste(f., fpost, sep = )) calledf(x) ## do more stuff } Two questions: 1) Why is the second better? 2) By changing g or h I could use do.call instead; why would that be better? Because I can handle differences in argument lists? Thanks, R. -- Ramón Díaz-Uriarte Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc ) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eval(parse(text vs. get when accessing a function
, enclos) : object f.8 not found So even in more general cases, except for function redefinitions, etc, you are not able to call non-existent stuff. 2nd, If I used the eval-parse approach then I would probably at some point redefine f.1 or f.2 to the output of a regression analysis or something, then go back and run the g function at a later time and wonder why I am getting an error, then once I have finally figured it out, now I need to remember what f.1 did and rewrite it again. I am much less likely to accidentally replace an element of a list, and if the list is well named I am unlikely to replace the whole list by accident. Yes, that is true. Again, it does not apply to the actual case I have in mind, but of course, without the detailed info on context I just gave, you could not know that. 3rd, If I ever want to use this code somewhere else (new version of R, on the laptop, give to coworker, ...), it is a lot easier to save and load a single list than to try to think of all the functions that need to be saved. Oh, sure. But all the functions above live in a single file (actually, a minipackage) except for the optional use function (which is read from a file). Personally I have never regretted trying not to underestimate my own future stupidity. Neither do I. And actually, that is why I asked: if Thomas Lumley said, in the fortune, that I better rethink about it, then I should try rethinking about it. But I asked because I failed to see what the problem is. Hope this helps, It certainly does. Best, R. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ramon Diaz-Uriarte Sent: Friday, January 05, 2007 11:41 AM To: Peter Dalgaard Cc: r-help; [EMAIL PROTECTED] Subject: Re: [R] eval(parse(text vs. get when accessing a function On Friday 05 January 2007 19:21, Peter Dalgaard wrote: Ramon Diaz-Uriarte wrote: Dear All, I've read Thomas Lumley's fortune If the answer is parse() you should usually rethink the question.. But I am not sure it that also applies (and why) to other situations (Lumley's comment http://tolstoy.newcastle.edu.au/R/help/05/02/12204.html was in reply to accessing a list). Suppose I have similarly called functions, except for a postfix. E.g. f.1 - function(x) {x + 1} f.2 - function(x) {x + 2} And sometimes I want to call f.1 and some other times f.2 inside another function. I can either do: g - function(x, fpost) { calledf - eval(parse(text = paste(f., fpost, sep = ))) calledf(x) ## do more stuff } Or: h - function(x, fpost) { calledf - get(paste(f., fpost, sep = )) calledf(x) ## do more stuff } Two questions: 1) Why is the second better? 2) By changing g or h I could use do.call instead; why would that be better? Because I can handle differences in argument lists? Dear Peter, Thanks for your answer. Who says that they are better? If the question is how to call a function specified by half of its name, the answer could well be to use parse(), the point is that you should rethink whether that was really the right question. Why not instead, e.g. f - list(1=function(x) {x + 1} , 2=function(x) {x + 2}) h - function(x, fpost) f[[fpost]](x) h(2,2) [1] 4 h(2,1) [1] 3 I see, this is direct way of dealing with the problem. However, you first need to build the f list, and you might not know about that ahead of time. For instance, if I build a function so that the only thing that you need to do to use my function g is to call your function f.something, and then pass the something. I am still under the impression that, given your answer, using eval(parse(text is not your preferred way. What are the possible problems (if there are any, that is). I guess I am puzzled by rethink whether that was really the right question. Thanks, R. Thanks, R. -- Ramón Díaz-Uriarte Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team
Re: [R] eval(parse(text vs. get when accessing a function
On 1/6/07, Thomas Lumley [EMAIL PROTECTED] wrote: On Fri, 5 Jan 2007, Ramon Diaz-Uriarte wrote: I see, this is direct way of dealing with the problem. However, you first need to build the f list, and you might not know about that ahead of time. For instance, if I build a function so that the only thing that you need to do to use my function g is to call your function f.something, and then pass the something. I am still under the impression that, given your answer, using eval(parse(text is not your preferred way. What are the possible problems (if there are any, that is). I guess I am puzzled by rethink whether that was really the right question. There are definitely situations where parse() is necessary or convenient, or we wouldn't provide it. For example, there are some formula-manipulation problems where it really does seem to be the best solution. The point of my observation was that it is relatively common for people to ask about parse() solutions to problems, but relatively rare to see them in code by experienced R programmers. The 'rethink the question' point is that a narrowly-posed programming problem may suggest parse() as the answer, when thinking more broadly about what you are trying to do may allow a completely different approach [the example of lists is a common one]. Yes, the general thing I am trying to do do ---see my response to Greg Snow for details--- has been done before. And I looked at code from more experienced programmers, such as David Meyer's tune in e1071. I think one of the reasons David is using do.call is that he allows to use arbitrary functions, whereas I do not (currently) need that functionality. Thus, instead of calling do.call(whatever) I can call internalGeneSelect. And, when reading my code, or debugging, it is easier for me to quickly decode internalGeneSelect (oh, yes, calling the geneSelection function) than decode do.call. But my internalGeneSelect depends on eval(parse(text = and that is where my doubts started. Because of this thread, though, I am actually starting to think I should go ahead and use do.call, because it will make life simpler if someone (including myself) decides to extend the code. I guess this can be a case of thinking more broadly. The problem with eval(parse()) is not primarily one of speed. A problem with parse() is than manipulating text strings is easy to mess up, since text has so much less structure than code. A problem with eval() is that it is too powerful -- since it can do anything, it is harder to keep track of what it is doing. Yes, I understand that. In my specific case, though, there is quite a high degree of structure on the text used. And I felt that do.call was also very powerful (and I've messed with ... in similar situations in the past). In one sense this is just a style issue, but I still think my comment is good advice. If you find yourself wanting to use parse() it is a good idea to stop and think about whether there is a better way to do it. Often, there is. Sometimes, there isn't. Thanks for your comments. I think here do.call might actually be the way to go. Best, R. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED]University of Washington, Seattle -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eval(parse(text vs. get when accessing a function
On 1/6/07, Brian Ripley [EMAIL PROTECTED] wrote: On Sat, 6 Jan 2007, Ramon Diaz-Uriarte wrote: (...) cvFunct - function(whatever, genefiltertype, whateverelse) { internalGeneSelect - eval(parse(text = paste(geneSelect, genefiltertype, sep = .))) ## do things calling internalGeneSelect, } That looks like a more complicated alternative to get(paste(geneSelect, genefiltertype, sep = .)) Yes, you are right, thanks. Actually, now that I think of it, the eval(parse(text looks _a lot_ more verbose. I would worry about scope in both cases: I think you most likely want eval.parent in yours, and to pick an environment for use in get() (but the view you have shown is still too narrow for us to know). The function where get (or eval) are called from is defined in a package. The other functions (the ones with the postfix) are either in the same package or in the global environment (read from a file). I think with both solutions (get and eval) and defining the other functions both ways (in a package and in the global env) I should be OK, but I probably want to make this explicit. Thanks, R. and now define all possible functions as geneSelect.Fratio - function(x, y, z) {##something} geneSelect.Wilcoxon - function(x, y, z) {## something else} If I want more geneSelect functions, adding them is simple. And I can even allow the user to pass her/his own functions, with the only restriction that it takes three args, x, y, z, and that the function is to be called: geneSelect. and a user choosen string. (Yes, I need to make sure no calls to system, etc, are in the user code, etc, etc, but that is another issue). [...] -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANCOVA
On 1/6/07, Michael Kubovy [EMAIL PROTECTED] wrote: On Jan 6, 2007, at 8:34 AM, John Cardinale wrote: Are there any R function which can do analysis of covariance? ?lm RSiteSearch('ancova') Given the question, you'll probably need to find how to do an ancova with lm. Several documents in http://cran.r-project.org/other-docs.html will show you how (and why ancova is just one special case of linear model). In particular, I think Faraway's Practical regression and Anova using R has explicit chapters/sections for Ancova. Many of other standard texts on R/S do too. R. _ Professor Michael Kubovy University of Virginia Department of Psychology USPS: P.O.Box 400400Charlottesville, VA 22904-4400 Parcels:Room 102Gilmer Hall McCormick RoadCharlottesville, VA 22903 Office:B011+1-434-982-4729 Lab:B019+1-434-982-4751 Fax:+1-434-982-4766 WWW:http://www.people.virginia.edu/~mk9y/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] eval(parse(text vs. get when accessing a function
Dear All, I've read Thomas Lumley's fortune If the answer is parse() you should usually rethink the question.. But I am not sure it that also applies (and why) to other situations (Lumley's comment http://tolstoy.newcastle.edu.au/R/help/05/02/12204.html was in reply to accessing a list). Suppose I have similarly called functions, except for a postfix. E.g. f.1 - function(x) {x + 1} f.2 - function(x) {x + 2} And sometimes I want to call f.1 and some other times f.2 inside another function. I can either do: g - function(x, fpost) { calledf - eval(parse(text = paste(f., fpost, sep = ))) calledf(x) ## do more stuff } Or: h - function(x, fpost) { calledf - get(paste(f., fpost, sep = )) calledf(x) ## do more stuff } Two questions: 1) Why is the second better? 2) By changing g or h I could use do.call instead; why would that be better? Because I can handle differences in argument lists? Thanks, R. -- Ramón Díaz-Uriarte Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eval(parse(text vs. get when accessing a function
On Friday 05 January 2007 19:21, Peter Dalgaard wrote: Ramon Diaz-Uriarte wrote: Dear All, I've read Thomas Lumley's fortune If the answer is parse() you should usually rethink the question.. But I am not sure it that also applies (and why) to other situations (Lumley's comment http://tolstoy.newcastle.edu.au/R/help/05/02/12204.html was in reply to accessing a list). Suppose I have similarly called functions, except for a postfix. E.g. f.1 - function(x) {x + 1} f.2 - function(x) {x + 2} And sometimes I want to call f.1 and some other times f.2 inside another function. I can either do: g - function(x, fpost) { calledf - eval(parse(text = paste(f., fpost, sep = ))) calledf(x) ## do more stuff } Or: h - function(x, fpost) { calledf - get(paste(f., fpost, sep = )) calledf(x) ## do more stuff } Two questions: 1) Why is the second better? 2) By changing g or h I could use do.call instead; why would that be better? Because I can handle differences in argument lists? Dear Peter, Thanks for your answer. Who says that they are better? If the question is how to call a function specified by half of its name, the answer could well be to use parse(), the point is that you should rethink whether that was really the right question. Why not instead, e.g. f - list(1=function(x) {x + 1} , 2=function(x) {x + 2}) h - function(x, fpost) f[[fpost]](x) h(2,2) [1] 4 h(2,1) [1] 3 I see, this is direct way of dealing with the problem. However, you first need to build the f list, and you might not know about that ahead of time. For instance, if I build a function so that the only thing that you need to do to use my function g is to call your function f.something, and then pass the something. I am still under the impression that, given your answer, using eval(parse(text is not your preferred way. What are the possible problems (if there are any, that is). I guess I am puzzled by rethink whether that was really the right question. Thanks, R. Thanks, R. -- Ramón Díaz-Uriarte Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eval(parse(text vs. get when accessing a function
On Friday 05 January 2007 19:35, Bert Gunter wrote: ?? Or to add to what Peter Dalgaard said... (perhaps for the case of many more functions) Why eval(parse())? What's wrong with if then? g - function(fpost,x){if(fpost==1)f.1 else f.2 }(x) or switch() if you have more than 2 possible arguments? I think your remarks reinforce the wisdom of Thomas's axiom . Thanks, Bert, but as with Peter's solution, your solution forces me to build g ahead of time. And again, I am not sure I see why the attempt to avoid eval(parse(text. Best, R. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ramon Diaz-Uriarte Sent: Friday, January 05, 2007 10:02 AM To: r-help; [EMAIL PROTECTED] Subject: [R] eval(parse(text vs. get when accessing a function Dear All, I've read Thomas Lumley's fortune If the answer is parse() you should usually rethink the question.. But I am not sure it that also applies (and why) to other situations (Lumley's comment http://tolstoy.newcastle.edu.au/R/help/05/02/12204.html was in reply to accessing a list). Suppose I have similarly called functions, except for a postfix. E.g. f.1 - function(x) {x + 1} f.2 - function(x) {x + 2} And sometimes I want to call f.1 and some other times f.2 inside another function. I can either do: g - function(x, fpost) { calledf - eval(parse(text = paste(f., fpost, sep = ))) calledf(x) ## do more stuff } Or: h - function(x, fpost) { calledf - get(paste(f., fpost, sep = )) calledf(x) ## do more stuff } Two questions: 1) Why is the second better? 2) By changing g or h I could use do.call instead; why would that be better? Because I can handle differences in argument lists? Thanks, R. -- Ramón Díaz-Uriarte Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple parallel computing on single multicore machine
On Friday 01 December 2006 13:23, Millo Giovanni wrote: Dear List, the advent of multicore machines in the consumer segment makes me wonder whether it would, at least in principle, be possible to divide a computational task into more slave R processes running on the different cores of the same processor, more or less in the way package SNOW would do on a cluster. I am thinking of simple 'embarassingly parallel' problems, just like inverting 1000 matrices, estimating 1000 models or the like. I have seen some talk here on making R multi-threaded and the like, but this is much simpler. I am just a curious useR, so don't bother if you don't have time, but maybe you can point me at some resource, or just say this is nonsense... Dear Millo, I find the usage of papply (from the library with the same name), which itself uses Rmpi to be easy and ideal for those cases. The papply documentation shows clearly what you need to do to pass the required arguments to papply. And once you have your MPI universe up and running (with whichever number of slaves you specify) it just works. As well, I find debugging very simple: just start an MPI universe with only one node, which forces papply to run serially (non-parallel) so wrong arguments, missing libraries, etc, are easy to spot. Best, R. Cheers Giovanni Giovanni Millo Research Dept., Assicurazioni Generali SpA Via Machiavelli 4, 34131 Trieste (Italy) tel. +39 040 671184 fax +39 040 671160 Ai sensi del D.Lgs. 196/2003 si precisa che le informazioni ...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramón Díaz-Uriarte Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] princomp and factanal()
On Tuesday 28 November 2006 16:03, Tom Backer Johnsen wrote: I have been looking at the documentation and the output from the functions princomp() and factanal(), and found them somewhat difficult to understand. This is probably due to differences in respect to what I am used to with respect to terminology (my field is psychology). Are there some additional texts which might help? Dear Tom, I suggest you look at MASS (Modern Applied Statistics With S), by Venables Ripley. These issues are explained there (someone borrowed my copy, so I can't tell you the chapter, pages, etc). You might also want to take a look in a multivariate stats. book (e.g., Krzanowski explains these issues very well) for general differences between PCA and factor analysis. HTH, R. Sincerely, Tom __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] command option for R CMD BATCH
Thanks. R. On Thursday 23 November 2006 16:32, Prof Brian Ripley wrote: On Thu, 23 Nov 2006, Ramon Diaz-Uriarte wrote: On Thursday 23 November 2006 15:44, Prof Brian Ripley wrote: Try this: gannet% cat month.R x - commandArgs() print(x[length(x)]) gannet% R --slave --args January month.R [1] January Is the above R --slave --args January month.R the preferred way of using it? Yes it is. That's exactly what --args was added to allow. I tend to use R --slave month.R January instead (as a consequence of reconverting former scripts that used R CMD BATCH). The second call produces a ARGUMENT 'January' __ignored__ but otherwise seems to do the same thing. -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] command option for R CMD BATCH
On Thursday 23 November 2006 15:44, Prof Brian Ripley wrote: Try this: gannet% cat month.R x - commandArgs() print(x[length(x)]) gannet% R --slave --args January month.R [1] January Is the above R --slave --args January month.R the preferred way of using it? I tend to use R --slave month.R January instead (as a consequence of reconverting former scripts that used R CMD BATCH). The second call produces a ARGUMENT 'January' __ignored__ but otherwise seems to do the same thing. Thanks, R. On Thu, 23 Nov 2006, Patrick Connolly wrote: I wish to use R CMD BATCH to run a small R function which reads a text file and plots a single graph to a PDF file. version _ platform x86_64-unknown-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 4.0 year 2006 month 10 day03 svn rev39566 language R version.string R version 2.4.0 (2006-10-03) The text files are monthly data, (called lyrical names like October.txt or November.txt) and the end result of each run will be a PDF file called October.pdf, etc. It's simple enough to make a separate file for each month which has the command to call the R function, e.g. October.r would be plot.month(October.txt) and use it like so: R CMD BATCH October.r /dev/null (the R function creates the name for the PDF file) or slightly more elegantly, a one line shell script that takes an argument: R CMD BATCH $1.r /dev/null (so that the script name and the name of the month will make a PDF file for that month) What I'd like to do is avoid the need to make the Month.r files and have the script pass the month information directly to the function that a single .r file would call. If I brushed up on a bit of Perl, I might work out how to modify the shell script to do such a thing, but I suspect it should be simpler than that. I had thought of using litter for such a thing, but as I looked into it, I get the impression that's not the idea of litter. (I'm also a bit reluctant to recompile R.) Ideas welcome. Thanks -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] snow's makeCluster hanging (using Rmpi)
On Tuesday 07 November 2006 19:28, Randall C Johnson [Contr.] wrote: On 11/7/06 11:28 AM, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote: On Tuesday 07 November 2006 15:56, Randall C Johnson [Contr.] wrote: Hello everyone, I've been fiddling around with the snow and Rmpi packages on my new Intel Mac, and have run into a few problems. When I make a cluster on my machine, both slaves start up just fine, and everything works as expected. When I try to make a cluster including another networked machine it hangs. I've followed the suggestions at http://finzi.psych.upenn.edu/R/Rhelp02a/archive/83086.html and http://www.stat.uiowa.edu/~luke/R/cluster/cluster.html but to no avail. Everything seems to start up fine using lamboot, but then hangs when making the cluster in R. Making a cluster with 2 slaves seems to work fine, but if I increase the number (to use the networked machines) it hangs again. I've tried networking to another Mac, and also to a machine running Red Hat Linux. Both machines can set up their own local clusters. Does anyone have any ideas? Dear Randy, A few suggestions: a) make sure there are no firewalls; I assume this is actually the case, but anyway; I don't think I have any firewalls running. I checked and they all seem to be disabled... you can use (under GNU/Linux at least) the command (as root) iptables -L If there are no iptables-based firewall you should see something like: Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination Make sure this is OK in all the machines. b) what happens if you lamboot outside R (and create a universe with a local and a networked machine) and then you do: lamexec -np 6 hostname? This prints out the host names of each machine as expected. OK, so its not lam itself (so a) is probably unneeded). c) are the Rmpi and snow installed in the same directories in the different machines? are there version differences in Rmpi (or Snow) between machines? I've installed the same versions, but they are in different directories... I think I remember that having Rmpi and Snow in different directories tended to cause problems. Now, I always place them in the same directory. I think that some sh Rmpi script looks for other scripts, and if they are not where it expect thems, it fails. I also tried an example per Luke Tierney's suggestion using only Rmpi, and I get the following error when trying to spawn the Rslaves after starting up with lamboot (outside of R). I tried to use laminfo, but I'm not sure what I'm looking for or how to use the information given... library(Rmpi) mpi.spawn.Rslaves() --- - It seems that [at least] one of the child processes that was started by MPI_Comm_spawn* chose a different RPI than the parent MPI application. For example, one (of the) child process(es) that differed from the parent is shown below: Parent application: MPI_Comm_spawn Child MPI_COMM_WORLD rank usysv (v7.1.0): 0 All MPI processes must choose the same RPI module and version when they start. Check your SSI settings and/or the local environment variables on each node. --- - R(26444) malloc: *** Deallocation of a pointer not malloced: 0x16379a0; This could be a double free(), or free() called with the middle of an allocated block; Try setting environment variable MallocHelp to see tools to help debug Error in mpi.comm.spawn(slave = system.file(Rslaves.sh, package = Rmpi), MPI_Error_string: unclassified Now that is way over my head. A few things I'd check: Are you mixing 32-bit with 64-bit machines? (I've done that in the past, x86 and x86_64, without apparent problems, but I've never used Macs for this). Can you try using two different machines with the same architecture? What about gcc compilers: are you using very different versions on each machine? Best, R. HTH, R. Thanks, Randy sessionInfo() R version 2.4.0 Patched (2006-10-03 r39576) i386-apple-darwin8.8.2 locale: C attached base packages: [1] methods stats graphics grDevices utils datasets [7] base other attached packages: Rmpisnow 0.5-3 0.2-2 -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help
Re: [R] snow's makeCluster hanging (using Rmpi)
On Tuesday 07 November 2006 15:56, Randall C Johnson [Contr.] wrote: Hello everyone, I've been fiddling around with the snow and Rmpi packages on my new Intel Mac, and have run into a few problems. When I make a cluster on my machine, both slaves start up just fine, and everything works as expected. When I try to make a cluster including another networked machine it hangs. I've followed the suggestions at http://finzi.psych.upenn.edu/R/Rhelp02a/archive/83086.html and http://www.stat.uiowa.edu/~luke/R/cluster/cluster.html but to no avail. Everything seems to start up fine using lamboot, but then hangs when making the cluster in R. Making a cluster with 2 slaves seems to work fine, but if I increase the number (to use the networked machines) it hangs again. I've tried networking to another Mac, and also to a machine running Red Hat Linux. Both machines can set up their own local clusters. Does anyone have any ideas? Dear Randy, A few suggestions: a) make sure there are no firewalls; I assume this is actually the case, but anyway; b) what happens if you lamboot outside R (and create a universe with a local and a networked machine) and then you do: lamexec -np 6 hostname? c) are the Rmpi and snow installed in the same directories in the different machines? are there version differences in Rmpi (or Snow) between machines? HTH, R. Thanks, Randy sessionInfo() R version 2.4.0 Patched (2006-10-03 r39576) i386-apple-darwin8.8.2 locale: C attached base packages: [1] methods stats graphics grDevices utils datasets [7] base other attached packages: Rmpisnow 0.5-3 0.2-2 ~~ Randall C Johnson Bioinformatics Analyst SAIC-Frederick, Inc (Contractor) Laboratory of Genomic Diversity NCI-Frederick, P.O. Box B Bldg 560, Rm 11-85 Frederick, MD 21702 Phone: (301) 846-1304 Fax: (301) 846-1686 ~~ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginners manual for emacs and ess
On Wednesday 20 September 2006 17:16, Marc Schwartz (via MN) wrote: On Wed, 2006-09-20 at 17:03 +0200, Rainer M Krug wrote: Hi I heard so much about Emacs and ESS that I decided to try it out - but I am stuck at the beginning. Is there anywhere a beginners manual for Emacs ESS to be used with R? even M-x S tells me it can't start S-Plus - obviously - but I want it to start R... While following Mark's suggestions, try doing M-x R and that might start R. Then, you can do: C-x 2 (split the screen as is said in other editors) move to the one without the running R, and open there an R file (or you can just create it on the fly: C-x C-f and in the minibuffer type anything, e.g., one-file.R (without the quotes). Then, type C-h m and you'll get a list of stuff related to the ESS mode. And I think you will then really need to look at the ESS doc and go through the (X)Emacs tutorial (which is available from the help, in (X)Emacs). HTH, R. Any help welcome (otherwise I will be stuck with Eclipse and R) Rainer There are some reference materials on the main ESS site at: http://ess.r-project.org/ In addition, there is a dedicated ESS mailing list, with more info here: https://stat.ethz.ch/mailman/listinfo/ess-help HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Statitics Textbook - any recommendation?
On Wednesday 20 September 2006 22:21, Iuri Gavronski wrote: I would like to buy a basic statistics book (experimental design, sampling, ANOVA, regression, etc.) with examples in R. Or download it in PDF or html format. I went to the CRAN contributed documentation, but there were only R textbooks, that is, textbooks where R is the focus, not the statistics. And I would like to find the opposite. Other text I am trying to find is multivariate data analysis (EFA, cluster, mult regression, MANOVA, etc.) with examples with R. Any recommendation? I'd say the situation is actually the opposite. Anyway, the recent book by Brian Everitt and Torsten Hothorn (A handbook of statistical analyses using R. Chapman Hall) is an excellent (and affordable) place to start. (I think that this book's context emphasizes that it is stats with R as the language: Everitt has (co)authored a bunch of others in other languages ---SAS, Stata, SPSS, etc). Of course, there are many others that probably deserver a place on your (or your library's) shelves: P. Dalgaard's MASS Maindonald Braun Heiberger Holland etc HTH, R. Thank you in advance, Iuri. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Authoring a book
Dear Tom, To add a few things to explore: - I'd definitely go with LaTeX. Depending on how much formatting control you want, though, and if your coworkers are reluctant to jump into LaTeX, you might start with reStructuredText (http://docutils.sourceforge.net/rst.html) or text2tags (http://txt2tags.sourceforge.net/). With both, you can produce LaTeX, but innitially at least it allows you to write text with structure using markup that is a lot simpler than latex. - I'd definitely use a version control system. Instead of CVS or SVN, though, I'd suggest you take a look at some of the distributed ones, in particular Bazaar-NG (http://bazaar-vcs.org), Mercurial (http://www.selenic.com/mercurial/wiki/index.cgi) or Darcs (http://abridgegame.org/darcs/). These three are probably among the most mature ones (though oppinions will vary, of course; I have some notes and links at: http://www.ligarto.org/rdiaz/VersionControl.html). What I like about any of these is that I think they provide you essentially all SVN can provide (except for the user-base and years of existence of SVN) plus a lot more. For instance, if you often work without access to the remote repository, with any of these three systems you can enjoy all the benifits of version control. Cherry-picking is easier with any of these than with CVS/SVN, and Darcs in particular excels at it. - For bibliography, I find CiteULike (http://www.citeulike.org/) fabulous. Needs internet access, and might not work with the journals/data bases that you use, though. It can export as bibtex. - If you find outliners useful (or absolutely essential) then you might want to look at Leo (http://webpages.charter.net/edreamleo/front.html). Leo is agnostic regarding whether you write LaTeX, plain text, or R code (though it has great support for some languages such as Python or rst), and you can use Leo and still edit files in your editor of choice (I use Leo for working with fairly large latex files that I edit under Emacs). However, for this to work, all of you should agree to use Leo (or at least not disturb the sentinel lines that leo uses). Hope this helps (or at least provides entertaining links :-). R. On Thursday 24 August 2006 21:10, Tom Backer Johnsen wrote: Stefan Grosse wrote: I think Peter Dalgaard is right. Since you are able to use R I believe you will be very fast in learning LaTeX. I think it needs less then a week to learn the most common LaTeX commands. And setting up a wiki and trying then to convert this into a printable document format plus learning the wiki syntax is probably more time consuming. Beside this R is able to work perfectly together with LaTeX, it creates LaTeX output and is doing excellent graphics in the EPS/PS format. The best introduction for LaTeX is the not so short introduction: http://people.ee.ethz.ch/~oetiker/lshort/lshort.pdf It really was a not too short intro. I'll have a look at it. If you still are not convinced have a look at UniWakkaWiki: http://uniwakka.sourceforge.net/HomePage It is a Wiki for Science and University purposes and claims to be able to export to Openoffice as well as to LaTeX. Looks interesting and I really like the concept, but how stable is it? It looks rather fresh from the web page, but I may be wrong. A bibliography function is really a big advantage, so ... perhaps. Tom __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rpvm/snow packages on a cluster with dual-processor machi nes
Dear Paul, (I forgot to answer over the weekend). With mpi it is essentially the same. When using makeCluster, specify the number of slaves. If you have three machines, and you want each to run two slave processes, just use a 6. Before that, though, you should tell LAM/MPI how to set up the lam universe. The simplest way is to specify that in a configuration file for LAM. Put something like this (using appropriate IPs or host names; cpu=xx indicates that you want each physical node to run those many xx slaves; it might, or might not, be related to the actual number of CPUs) in a file called, say, lamb-conf1.def 192.168.2.2 cpu=2 192.168.2.3 cpu=2 192.168.2.4 cpu=2 Now do (as user, NOT root) lamboot -v lamb-conf1.def If that works, then start R, and use snow. A very good explanation on how to use mpi with R appeared in R news a while ago by the author of Rmpi. HTH, R. On Monday 14 August 2006 16:17, Liaw, Andy wrote: That's what I've tried before, on three dual-Xeon boxes, so I know it worked (as documented a that time). Andy From: Paul Y. Peng Luke Tierney just reminded me that makeCluster() can take a number greater than the number of machines in a cluster. It seems to be a solution to this problem. But I haven't tested it yet. Paul. Ryan Austin wrote: Hi, Adding a node twice gives a duplicate node error. However, adding the parameter sp=2000 to your pvm hostfile should enable dual processors. Ryan Liaw, Andy wrote: Caveat: I've only played with this a couple of years ago... I believe you can just add each host _twice_ (or as many times as the number of CPUs at that host) to get both CPUs to work. Andy From: Paul Y. Peng Hi, does anybody know how to use the dual processors in the machines of a cluster? I am using R with rpvm and snow packages. I usually start pvm daemon and add host machines first, and then run R to start my computing work. But I find that only one processor in each machine is used in this way and the other one always stays idle. Is there any simple way to tell pvm to use the two processors at the same time? In other words, I would like to see two copies of R running on each machine's two processors when using pvm. Any hints/help are greatly appreciated. Paul. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rpvm/snow packages on a cluster with dual-processor machines
Dear Paul, I have no direct experience with rpvm, but doing it with rmpi is a piece of cake. I could provide you with some hints if you want. (I am tempted to ask why you are using PVM instead of MPI, but this might be the wrong question). Best, R. On Friday 11 August 2006 18:12, Paul Y. Peng wrote: Hi, does anybody know how to use the dual processors in the machines of a cluster? I am using R with rpvm and snow packages. I usually start pvm daemon and add host machines first, and then run R to start my computing work. But I find that only one processor in each machine is used in this way and the other one always stays idle. Is there any simple way to tell pvm to use the two processors at the same time? In other words, I would like to see two copies of R running on each machine's two processors when using pvm. Any hints/help are greatly appreciated. Paul. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rpvm/snow packages on a cluster with dual-processor machines
Dear Paul, I am leaving right now. I'll send you the info over the weekend. But note that I do think it is quite possible to use pvm for your setup. I just have no experience with it. R. On Friday 11 August 2006 19:21, Paul Y. Peng wrote: Hi Ramon, please let me know how you achieve this with rmpi. I use PVM simply because I picked it up first and it worked well for me. If MPI is the only way to make use the two processors, I will find out whether it is available or works in our cluster. Thanks a lot for your response. Regards, Paul. Ramon Diaz-Uriarte wrote: Dear Paul, I have no direct experience with rpvm, but doing it with rmpi is a piece of cake. I could provide you with some hints if you want. (I am tempted to ask why you are using PVM instead of MPI, but this might be the wrong question). Best, R. On Friday 11 August 2006 18:12, Paul Y. Peng wrote: Hi, does anybody know how to use the dual processors in the machines of a cluster? I am using R with rpvm and snow packages. I usually start pvm daemon and add host machines first, and then run R to start my computing work. But I find that only one processor in each machine is used in this way and the other one always stays idle. Is there any simple way to tell pvm to use the two processors at the same time? In other words, I would like to see two copies of R running on each machine's two processors when using pvm. Any hints/help are greatly appreciated. Paul. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problems when combining randomForests
Dear Eleni, But if every time you remove a variable you pass some test data (ie data not used to train the model) and base the performance of the new, reduced model on the error rate on the confusion matrix for the test data, then this overfitting should not be an issue, right? (unless of course you were referring to unsupervised learning). Yes and no. The problem there could arise if you do this iteratively and use the minimum value you obtain with your procedure to return an estimate of the error rate. In such a case, you should, instead, do a double cross-validation or bootstrap (i.e., estimate, via cross-validation ---or the bootstrap--- the error rate of your complete procedure). Both Andy and collaborators on the one hand and myself on the other have done some further work on these issues. Svetnik V, Liaw A, Tong C, Wang T: Application of Breiman's random forest to modeling structure-activity relationships of pharmaceutical molecules. Multiple Classier Systems, Fifth International Workshop, MCS 2004, Proceedings, 9–11 June 2004, Cagliari, Italy. Lecture Notes in Computer Science, Springer 2004, 3077:334-343. Gene selection and classification of microarray data using random forest Ramón Díaz-Uriarte and Sara Alvarez de Andrés. BMC Bioinformatics 2006, 7:3. http://www.biomedcentral.com/1471-2105/7/3 Best, R. On Monday 31 July 2006 18:45, Eleni Rapsomaniki wrote: Hi Andy, I get different order of importance for my variables depending on their order in the training data. Perhaps answering my own question, the change in importance rankings could be attributed to the fact that before passing my data to randomForest I impute the missing values randomly (using the combined distributions of pos+neg), so the data seen by RF is slightly different. Then combining this with the fact that RF chooses data randomly it makes sense to see different rankings. In a previous thread regarding simplifying variables: http://thread.gmane.org/gmane.comp.lang.r.general/6989/focus=6993 you say: The basic problem is that when you select important variables by RF and then re-run RF with those variables, the OOB error rate become biased downward. As you iterate more times, the overfitting becomes more and more severe (in the sense that, the OOB error rate will keep decreasing while error rate on an independent test set will be flat or increases) But if every time you remove a variable you pass some test data (ie data not used to train the model) and base the performance of the new, reduced model on the error rate on the confusion matrix for the test data, then this overfitting should not be an issue, right? (unless of course you were referring to unsupervised learning). Best regards Eleni Rapsomaniki Birkbeck College, UK __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electr�nico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Colinearity Function in R
Dear Peter, I especially like the VIF (and GVIF) functions in package car, by John Fox. (I'm assuming you are dealing with [generalized] linear models). HTH, R. On Wednesday 05 July 2006 17:16, Peter Lauren wrote: Is there a colinearty function implemented in R? I have tried help.search(colinearity) and help.search(collinearity) and have searched for colinearity and collinearity on http://www.rpad.org/Rpad/Rpad-refcard.pdf but with no success. Many thanks in advance, Peter Lauren. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Editors which have strong/solid support for SWeave?
On Wednesday 05 July 2006 10:14, A.J. Rossini wrote: Greetings! I have a few colleagues who like the idea of Sweave, but have failed to become enlightened monks of the One True Editor (http://www.dina.dk/~abraham/religion/) Are there any other Microsoft-centric editors or IDEs which have solid support for writing SWeave documents (dual R / LaTeX enhancements similar to ESS's support)? Has anyone tried the folding editors which support Noweb? Dear Tony, I often use Leo (http://webpages.charter.net/edreamleo/front.html) which is like a literate editor on steroids (folding + outlining, noweb and cweb support, and a _lot_ more), and I use it for all complex/long Rnw documents, including interacting with R ... ...but I cheat, because the editing itself (of the nodes or folds), including submitting code to R from the R chunks, I do in emacs (with ESS). Leo is available for Linux, Win, Mac and is written in Python. R. (the alternative would be brainwashing, but that is generally frowned upon ;-). best, -tony [EMAIL PROTECTED] Muttenz, Switzerland. Commit early,commit often, and commit in a repository from which we can easily roll-back your mistakes (AJR, 4Jan05). __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Editors which have strong/solid support for SWeave?
On Wednesday 05 July 2006 16:05, A.J. Rossini wrote: On 7/5/06, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote: On Wednesday 05 July 2006 10:14, A.J. Rossini wrote: Greetings! I have a few colleagues who like the idea of Sweave, but have failed to become enlightened monks of the One True Editor (http://www.dina.dk/~abraham/religion/) Are there any other Microsoft-centric editors or IDEs which have solid support for writing SWeave documents (dual R / LaTeX enhancements similar to ESS's support)? Has anyone tried the folding editors which support Noweb? Dear Tony, I often use Leo (http://webpages.charter.net/edreamleo/front.html) which is like a literate editor on steroids (folding + outlining, noweb and cweb support, and a _lot_ more), and I use it for all complex/long Rnw documents, including interacting with R ... ...but I cheat, because the editing itself (of the nodes or folds), including submitting code to R from the R chunks, I do in emacs (with ESS). Leo is available for Linux, Win, Mac and is written in Python. I've used Leo a few years ago, and liked it (but not enough to convert). I'll have to try it again. Thanks! From my Leo's usage patterns I think I'm still praying at the emacs church. I guess my soul is saved (for now). But I find Leo great, and I always wish I could use it more. Making it understand R syntax for syntax highlighting seems to be relatively easy, more so with the recent changes in Leo's code (http://webpages.charter.net/edreamleo/coloring.html), and at least one other R user who also frequents R-help, Ed Borasky, is interested in these issues (http://sourceforge.net/forum/forum.php?thread_id=1524935forum_id=10226). I think what would be a real blast is to have Leo understand R (and LaTeX), more or less the way leo understands Python. For instance, when one imports a Python file it gets broken down (outlined) by function, method, etc. This seems doable (e.g., http://sourceforge.net/forum/message.php?msg_id=3614539), but I haven't yet had time to look at it. And then, Leo also offers a general way (which I think is still only fully exploited with Python files) for autocompletion, etc, (though this seems to be a harder problem). Just my random ramblings. Best, R. best, -tony [EMAIL PROTECTED] Muttenz, Switzerland. Commit early,commit often, and commit in a repository from which we can easily roll-back your mistakes (AJR, 4Jan05). -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electr�nico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] FW: How to create a new package?
Dear Rita, Do you want a package just for yourself, or something useful for others, with docs, etc? I think the rest of the answers in this thread will help you create a full fledged package. See also the detailed explanation in Writing R extensions. If you just want something quick and dirty that allows you to use a bunch of functions without using source (and thus cluttering your global workspace), is easy to move around, etc, you just need a directory structure such as: SignS2/ SignS2/R/ SignS2/R/SignS2.R SignS2/DESCRIPTION SignS2/Changes (Change SignS2 for the name of your package). This has no documentation whatsoever. You can get rid of the changes file, but I put it there to keep track of changes. Run R CMD check against the directory (of coruse, you'll get warnings about missing documentation), and then R CMD build. Best, R. On Thursday 01 June 2006 13:23, michael watson (IAH-C) wrote: ?package.skeleton -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: 01 June 2006 12:20 To: Rita Sousa Cc: r-help@stat.math.ethz.ch Subject: Re: [R] FW: How to create a new package? The minimum is to create a DESCRIPTION file, plus R and man directories containing R code and .Rd files respectively. It might help to run Rcmd CHECK mypkg before installation and fix any problems it finds. Googling for creating R package will locate some tutorials. On 6/1/06, Rita Sousa [EMAIL PROTECTED] wrote: Hi, I'm a group of functions and I would like to create a package for load in R. I have created a directory named INE and a directory below that named R, for the files of R functions. A have created the files DESCRIPTION and INDEX in the INE directory. The installation from local zip files, in the R 2.3.0, results but to load the package I get an error like: 'INE' is not a valid package -- installed 2.0.0? I think that is necessary create a Meta directory with package.rds file, but I don't know make it! I have read the manual 'Writing R Extensions - 1. Creating R packages' but I don't understand the procedure... Can I create it automatically? Could you help me with this? Thanks, --- Rita Sousa DME - ME: Departamento de Metodologia Estatística - Métodos Estatísticos INE - DRP: Instituto Nacional de Estatística - Delegação Regional do Porto Tel.: 22 6072016 (Extensão: 4116) --- [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Tranferring R results to word prosessors
I started using LyX; it is very straightforward. Then, I started exporting to LaTeX and playing around with the LaTeX file (I found it faster than using LyX, and could take my file anywhere they had something that could manipulate text ---emacs, vim, nedit, whatever). Googling you'll find _many_ LaTeX tutorials. Which one is best probably depends a lot on your preferences and learning style. As for books, I find Guide to LaTeX by Kopka Daly (I think now in its fourth edition) far easier to use (to learn and for reference) than the series of LaTeX books by Goossens et al. and Lamport. (And I only need to haul around a single book, not 2 to 5). But then, again, this is surely a matter of personal taste. HTH, R. On Thursday 09 February 2006 20:08, Patrick Burns wrote: One approach is to use LyX (http://www.lyx.org/). This is a lot like using Word or other word processors but it creates LaTeX. You probably won't need to know anything about TeX for a long time unless you are doing really weird things. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) roger bos wrote: Yeah, but I don't understand LaTeX at all. Can you point me to a good beginners guide? Thanks, Roger On 2/9/06, Barry Rowlingson [EMAIL PROTECTED] wrote: Tom Backer Johnsen wrote: I have just started looking at R, and are getting more and more irritated at myself for not having done that before. However, one of the things I have not found in the documentation is some way of preparing output from R for convenient formatting into something like MS Word. Well whatever you do, don't start looking at LaTeX, because that will get you even more irritated at yourself for not having done it before. LaTeX is to Word as R is to what? SPSS? I've still not seen a pretty piece of mathematics - or even text - in Word. Barry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R, AMD Opteron 64, and Rmpi
Dear All, I found Andy Liaw's suggestion about using a NUMA (instead of SMP) kernel when running R on amd64 with 1 CPU http://finzi.psych.upenn.edu/R/Rhelp02a/archive/35109.html A couple of questions: 1. Is this still the case with the newer dual-core opterons (e.g., the 275 et al., families) running Linux (kernel 2.6)? 2. How does this affect using Rmpi (and snow, papply, et al.) on multi-server clusters with 1 CPU? If I understand correctly, and if the situation is what Andy described, if we use a SMP kernel we will suffer a within-node penalty in one of the Rmpi processes. Is this correct? Thanks, R. -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] studentship till January 2007
Please pass along, and apologies for double posting. We have money to support a student till January 2007 (with a salary of about 1000 euro/month). Most of the work will focus on classification/prediction using microarray data. The work will involve both methodological research (mainly computation and simulation-based) and implementation of existing approaches using R (and possible development of web-based applications). In addition to the main focus of the job, the student will be encouraged (and expected) to get involved in the many collaborations we have with wet lab cancer researchers. The candidate should have a bachelors or MSc degree in stats or related fields. A genuine interest in applied statistics and statistical consulting,and experience with multivariate methods, linear models, logistic regression, and survival analysis are required. Proficiency with R and knowledge of C/C++ or Fortran are required. Familiarity with Python (and Perl and/or Tcl/Tk) and some experience with development of web-based applications (CGIs using Python, for example) highly valued. Our machines only run GNU/Linux (or other Unixes) and thus knowledge of Linux to administer your workstation is needed. The Bioinformatics Unit is one of the leading bioinformatics group in Spain, part of CNIO, one of the main cancer research institutes in Spain. We have developed a set of widely used web-based microarray data analysis tools, and have extensive computational facilities, including two computing clusters (with x86s and opterons) that use MPI and OpenMosix. You can check more of what we do at our group webpage (http://bioinfo.cnio.es) and my own web page (http://ligarto.org/rdiaz). For further details please email Ramón Díaz-Uriarte at [EMAIL PROTECTED] Best, -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) --- -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] building from source after installing binary package
Dear Uwe, Yes, sure, I understand how to install to another directory. I think I was not very clear, because my doubt is whether I should do that, or if it is OK to install to the very same place where Debian left the previous installation. By doing the later I save myself having to reinstall packages, etc, etc. R. On Friday 06 May 2005 08:53, Uwe Ligges wrote: Diaz.Ramon wrote: Dear All, I've got into the habit of installing R from the precompiled Debian binaries, including many of the packages from the r-cran-* Debian packages, and later building from source (e.g., to link against Goto's BLAS, or to build patched versions, etc). I install the newly built R to the very same place (/usr/lib/R). This allows me to build and update R when I wish, AND provides the ease of quickly updating many packages. Things have always worked fine, but after a few funny problems (which could be unrelated to the process itself) I've started wondering if this is a rather silly thing to do, and if I should keep my own build separate from the Debian stuff. Any advice would be much appreciated. Yes, simply install to another directory, e.g. by telling configure: ./configure --prefix=/I/want/to/have/R/installed/here Uwe Ligges Thanks, R. -- Ramn Daz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncolgicas (CNIO) (Spanish National Cancer Center) Melchor Fernndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrnico, y en su caso los ficheros adjuntos, pueden contener informacin protegida para el uso exclusivo de su destinatario. Se prohbe la distribucin, reproduccin o cualquier otro tipo de transmisin por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramn Daz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncolgicas (CNIO) (Spanish National Cancer Center) Melchor Fernndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrnico, y en su caso los ficheros adjuntos, pueden contener informacin protegida para el uso exclusivo de su destinatario. Se prohbe la distribucin, reproduccin o cualquier otro tipo de transmisin por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] building from source after installing binary package
On Friday 06 May 2005 09:48, Prof Brian Ripley wrote: On Fri, 6 May 2005, Uwe Ligges wrote: Diaz.Ramon wrote: Dear All, I've got into the habit of installing R from the precompiled Debian binaries, including many of the packages from the r-cran-* Debian packages, and later building from source (e.g., to link against Goto's BLAS, or to build patched versions, etc). I install the newly built R to the very same place (/usr/lib/R). This allows me to build and update R when I wish, AND provides the ease of quickly updating many packages. Things have always worked fine, but after a few funny problems (which could be unrelated to the process itself) I've started wondering if this is a rather silly thing to do, and if I should keep my own build separate from the Debian stuff. Any advice would be much appreciated. Yes, simply install to another directory, e.g. by telling configure: ./configure --prefix=/I/want/to/have/R/installed/here I don't think that is the point: Ramon must have done that as the default installation place is /usr/local/lib/R. Yes, I did change the --prefix because Debian installs to /usr/lib. I think this is a Debian-specific question (there is a R-debian list) and the point may be to make use of the binary Debian packages. I would Yes, that is correct (I guess I was not being very clear... too late last night). I'll ask in the debian list; I asked here just in case people with other GNU/Linux distributions did (or did not) do similar things. advocate installing R from the sources into /usr/local, and having separate directory trees both for packages you install and for Debian packages. Then you can manipulate which packages are seen via R_LIBS. Thanks. I'll try that. Best, R. -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros adjuntos, pueden contener información protegida para el uso exclusivo de su destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de transmisión por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values
Dear Giordano, Library Hmisc, by Frank Harrell, contains several functions for imputation which I have found extremely useful. Best, R. On Tuesday 26 April 2005 11:58, Giordano Sanchez wrote: Hello, Thanks for the instructive responses. But two questions arise. Firstable I can't manage to load the library mice. I'm using R 2.0.1 on my Debian I try just copying the package in my library /usr/lib/R/library . but when i do library() ... mice ** No title available (pre-2.0.0 install?) ** ... and when i do library(mice) Error in library(mice) : 'mice' is not a valid package --installed 2.0.0? The second question is more statistical: aregImpute() seems to give good results but i would like to compare the different methods not just graphically. It'is possible? I also have other meteorological stations that have correleted data with the data station I'm using? Can I use those data to improve my imputation method. Regards, Giordano __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros adjuntos, pueden contener información protegida para el uso exclusivo de su destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de transmisión por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] cross validation and parameter determination
On Wednesday 20 April 2005 00:17, array chip wrote: Hi all, In Tibshirani's PNAS paper about nearest shrunken centroid analysis of microarrays (PNAS vol 99:6567), they used cross validation to choose the amount of shrinkage used in the model, and then test the performance of the model with the cross-validated shrinkage in separate independent testing set. If I don't have the luxury of having independent testing set, can I just use the cross validation performance as the performance estimate? In other words, can I use the same single cross-validation to both choose the value of the parameter (amount of shrinkage in this case) and estimate the performance which was based on the value of the parameter chosen by the same cross-validation? I kind of feel awkward by getting both on a single cross validation, because it seems like I used the dataset in training set manner. Am I wrong/right? That error rate is probably optimistic, because as you say cross-validation? I kind of feel awkward by getting both on a single cross validation, because it seems like I used the dataset in training set manner. Am I However, you can easily wrap the whole pam procedure within an outer-loop of cross validation or bootstrap. (This problem is not that different from, say, using knn and selecting k using cross-validation; or selecting the number of genes to use with cross-validation, etc. You should then assess the error rate of your procedure). R. Thanks! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros adjuntos, pueden contener información protegida para el uso exclusivo de su destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de transmisión por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lme: error message with random=~1
On Wednesday 05 January 2005 16:29, Thomas Petzoldt wrote: Douglas Bates wrote: I'm not sure what model you want to fit here. To specify a random effect in lme you need both a grouping factor and a model matrix. The error message indicates that lme is unable to determine a grouping factor. It would be correct syntax if you added a single level factor to the data frame and used that but then the model fit would fail because you would be trying to estimate a variance in a model where there is no variation in the term. O.k. I see and think I understand it. It seems to me that you are trying to estimate parameters in a mixed-effects model without any random effects and lme can't do that. Yes, what I want is a model without any random effects to be tested against a model with random effects. I want to show, that the random effects are negligible but that we account for pseudo replicates and have tested this explicitely. Dear Tomas, What about fitting the models with and without random effects using the gls function (instead of lme ---you'll need to change a bit the syntax in the model with random effects in lm), and using a LR test? R. I'm not sure what is better: to leave the random effects in the model or simply an LR test against a linear model fitted by lm. I've never seen such an example in the books. Or have I missed a global alternative here? Thomas P. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramn Daz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncolgicas (CNIO) (Spanish National Cancer Center) Melchor Fernndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) Este correo electronico y, en su caso, cualquier fichero anexo al mismo, contiene informacion exclusivamente dirigida a su destinatario o destinatarios. Si Vd. ha recibido este mensaje por error, se ruega notificar esta circunstancia al remitente. Las ideas y opiniones manifestadas en este mensaje corresponden unicamente a su autor y no representan necesariamente a las del Centro Nacional de Investigaciones Oncologicas (CNIO). The information contained in this message is intended for the addressee only. If you have received this message in error or there are any problems please notify the originator. Please note that the Spanish National Cancer Centre (CNIO), does not accept liability for any statements or opinions made which are clearly the sender's own and not expressly made on behalf of the Centre. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] LDA with previous PCA for dimensionality reduction
Dear Cristoph, David, Torsten and Bjørn-Helge, I think that Bjørn-Helge has made more explicit what I had in mind (which I think is close also to what David mentioned). As well, at the very least, not placing the PCA inside the cross-validation will underestimate the variance in the predictions. Best, R. On Thursday 25 November 2004 15:05, Bjørn-Helge Mevik wrote: Torsten Hothorn writes: as long as one does not use the information in the response (the class variable, in this case) I don't think that one ends up with an optimistically biased estimate of the error I would be a little careful, though. The left-out sample in the LDA-cross-validation, will still have influenced the PCA used to build the LDA on the rest of the samples. The sample will have a tendency to lie closer to the centre of the complete PCA than of a PCA on the remaining samples. Also, if the sample has a high leverage on the PCA, the directions of the two PCAs can be quite different. Thus, the LDA is built on data that fits better to the left-out sample than if the sample was a completely new sample. I have no proofs or numerical studies showing that this gives over-optimistic error rates, but I would not recommend placing the PCA outside the cross-validation. (The same for any resampling-based validation.) -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] LDA with previous PCA for dimensionality reduction
Dear Cristoph, I guess you want to assess the error rate of a LDA that has been fitted to a set of currently existing training data, and that in the future you will get some new observation(s) for which you want to make a prediction. Then, I'd say that you want to use the second approach. You might find that the first step turns out to be crucial and, after all, your whole subsequent LDA is contingent on the PC scores you obtain on the previous step. Somewhat similar issues have been discussed in the microarray literature. Two references are: @ARTICLE{ambroise-02, author = {Ambroise, C. and McLachlan, G. J.}, title = {Selection bias in gene extraction on the basis of microarray gene-expression data}, journal = {Proc Natl Acad Sci USA}, year = {2002}, volume = {99}, pages = {6562--6566}, number = {10}, } @ARTICLE{simon-03, author = {Simon, R. and Radmacher, M. D. and Dobbin, K. and McShane, L. M.}, title = {Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification}, journal = {Journal of the National Cancer Institute}, year = {2003}, volume = {95}, pages = {14--18}, number = {1}, } I am not sure, though, why you use PCA followed by LDA. But that's another story. Best, R. On Wednesday 24 November 2004 11:16, Christoph Lehmann wrote: Dear all, not really a R question but: If I want to check for the classification accuracy of a LDA with previous PCA for dimensionality reduction by means of the LOOCV method: Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA with the CV option set to TRUE (runs LOOCV) -- OR-- do I need - to compute for each 'test-bag' (the n-1 observations) a PCA (my.princomp.1), - then run the LDA on the test-bag scores (- my.lda.1) - then compute the scores of the left-out-observation using my.princomp.1 (- my.scores.2) - and only then use predict.lda(my.lda.1, my.scores.2) on the scores of the left-out-observation ? I read some articles, where they choose procedure 1, but I am not sure, if this is really correct? many thanks for a hint Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] t test problem?
On Wednesday 22 September 2004 13:07, Ted Harding wrote: On 22-Sep-04 kan Liu wrote: Hi, Many thanks for your helpful comments and suggestions. The attached are the data in both log10 scale and original scale. It would be very grateful if you could suggest which version of test should be used. By the way, how to check whether the variation is additive (natural scale) or multiplicative (log scale) in R? How to check whether the distribution of the data is normal? As for additive vs multiplicative, this can only be judged in terms of the process by which the values are created in the real world. Just my 2 cents: I often find it helpful to ask myself (or the client) whether, if there was a difference (something) between the two samples, I/she/he thinks the appropriate model is (please, read the = as approx. equal) sample.1 = sample.2 + something [1] OR sample.1 = sample.2 * something [2] (i.e., the ratio of means is a constant: sample.1/sample.2 = something) which, by log transforming becomes log(sample.1) = log(sample.2) + log(something) I am not including here the issue of error distribution, but often times when the model for the means is like [2] the error terms are multiplicative (i.e., additive in the log scale). At least in many biological and engineering problems it is often evident whether [1] or [2] should be appropriate for the data, given what we know about the subject. Best, R. As for normality vs non-normality, an appraisal can often be made simply by looking at a histogram of the data. In your case, the commands hist(x,breaks=1*(0:100)) hist(y,breaks=1*(0:100)) indicate that the distributions of x and y do not look at all normal, since they both have considerable positive skewness (i.e. long upper tails relative to the main mass of the distribution). This does strongly suggest that a logarithmic transformation would give data which are more nearly normally distributed, as indeed is confirmed by the commands hist(log(x)) hist(log(y)) though in both cases the histograms show some irregularity compared with what you would expect from a sample from a normal distribution: the commands hist(log(x),breaks=0.2*(40:80)) hist(log(y),breaks=0.2*(40:80)) show that log(x) has an excessive peak at around 11.7, while log(y) has holes at around 11.1 and 12.1. Nevertheless, this inspection of the data shows that the use of log(x) and log(y) will come much closer to fulfilling the conditions of validity of the t test than using the raw data x and y. However, it is not merely the *normality* of each which is needed: the conditions for the usual t test also require that the two populations sampled for log(x) and log(y) should have the same standard deviations. In your case, this also turns out to be nearly enough true: sd(log(x)) [1] 0.902579 sd(log(y)) [1] 0.9314807 PS, Can I confirm that do your suggestions mean that in order to check whether there is a difference between x and y in terms of mean I need check the distribution of x and that of y in both natual and log scales and to see which present normal distribution? See above for an approach to this: the answer to your question is, in effect, yes. It could of course have happened that neither the raw nor the log scale would be satisfactory, in which case you would need to consider other possibilities. And, if the SDs had turned out to be very different, you should not use the standard t test but a variant which is adpated to the situation (e.g. the Welch test). You can, of course, also perform formal tests for skewness, for normality, and for equality of variances. Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 22-Sep-04 Time: 12:07:07 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] can't understand R
Dear Erin, On Tuesday 21 September 2004 06:10, Erin L. Leisz wrote: hi. i really need help using this program. computer language is a foreign language to me, and thus, i cannot make heads nor tails of the user manuals from the website. i need to locate step-by-step examples of simple If you plan to use R more than once, I think you probably want to get used to using the manuals (starting with An introduction to R, and maybe some of the other intro material available from the R web site). problems such as graph f(x)+g(x) and f(g(x)) for the domain 0x2 and graph 2H(x), H(x)+1, H(x+1) i do know how to define the functions, but that's it. is there any help you could provide me? i would appreciate some help asap. thank you very much For this particular case of plotting f(x), you can take a look at the function curve (do: ?curve at the R prompt). Hope this helps. R. erin leisz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] degrees of freedom (lme4 and nlme)
Dear Elizabeth, When I looked for this a couple of years ago, I found DF's to be discussed in the book by Pinheiro Bates Mixed effects models for S and S-Plus, as well as the documentation for SAS's PROC MIXED (I believe that the discussion on df's on the SAS manual was more complete than on the SAS system for mixed models book ---and I think html versions of the manuals for v 8 of SAS can be found on the web). I do not remember specifically, though, whether this discussions mentioned explicitly DFs for fixed effects with crossed random effects (I do not have the references here now). Best, R. On Wednesday 08 September 2004 19:54, Elizabeth Lynch wrote: Hi, I'm looking for pointers/references on calculating den DF's for fixed effects when using crossed random effects. Also, is there an implementation of simulate.lme that I could use in lme4? Thanks, Elizabeth Lynch Douglas Bates wrote: Alexandre Galvão Patriota wrote: Hi, I'm having some problems regarding the packages lme4 and nlme, more specifically in the denominator degrees of freedom. SNIP The lme4 package is under development and only has a stub for the code that calculates the denominator degrees of freedom. These Wald-type tests using the F and t distributions are approximations at best. In that sense there is no correct degrees of freedom. I think the more accurate tests may end up being the restricted likelihood ratio tests that Greg Reinsel and his student Mr. Ahn were working on at the time of Greg's death. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html hthttp://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] bootstrap: stratified resampling
Dear All, I was writing a small wrapper to bootstrap a classification algorithm, but if we generate the indices in the usual way as: bootindex - sample(index, N, replace = TRUE) there is a non-zero probability that all the samples belong to only one class, thus leading to problems in the fitting (or that some classes will end up with only one sample, which will be a problem for quadratic discriminant analysis). It thought this situation should be frequent enough to be mentioned in the literature, but I have found almost no mention in the references I have available, except for Hirst (see below). If I've reread correctly, this issue is not mentioned in Efron Tibshirani (1997; the .632+ paper), or in Efron and Gong (the TAS leisure look paper), or the Efron Tibshirani 1993 bootstrap book, or Chernick's Bootstrap methods book. I've only seen some side mentions in Ripley's Pattern recognition (when talking about stratified cross-validation), and Davison Hinkley's bootstrap book when, on p. 304, they refer to some subsets having singular design matrices, and thus requiring stratification on covars. McLachlan (in his discriminant analysis book), on p. 347, differentiates between mixture sampling and separate sampling, but I can find a mention of what do when, under mixture sampling, we end up with all samples in only one group. Only Hirst (1996, Technometrics, 38 (4): 389--399) says that each bootstrap sample should include at least one observation for each group, and at least enough different observations from each group to allow estimation of the covariance matrix (he is referring to discriminant analysis), and thus he uses essentially stratified bootstrap samples. Interestingly, the boot function (boot library) says For nonparametric multi-sample problems stratified resampling is used.. As well, the predab.resample (Design library) says group: a grouping variable used to stratify the sample upon bootstrapping. This allows one to handle k-sample problems, (...). That the authors of boot and Design are using stratified resampling indicates to me that this might be the obvious, unproblematic way to go, but I understood that stratified resampling was OK only when that was sampling scheme that generated the data. What am I missing? Thanks, R. -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz PGP KeyID: 0xE89B3462 (http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] citing a package?
Dear Martin, I'd suggest you check the DESCRIPTION file and ask the author(s) of the package (e.g., a package might be related to a tech report which might, now, be in press, or whatever). Best, R. On Monday 09 February 2004 15:21, Martin Henry H. Stevens wrote: How do I cite a package (not R itself - I know how to do that)? Any thoughts or links? Many thanks in advance? Hank Stevens Dr. Martin Henry H. Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/botany/bot/henry.html http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz PGP KeyID: 0xE89B3462 (http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] SIR
This is strange; the sir for R I know (in package dr on CRAN, from S. Weisberg), last time I checked (about a year ago?) was able to handle multivariate responses. In fact, p. 6 of the documentation shows an example of SIR with a bivariate response, and I tried it, and it works. Best, R. On Friday 16 January 2004 10:04, hagric wrote: I have found a version of SIR in R and I have tried it. But the problem with this file is the fact that it does not cope with multivariate response variables. Is there any version of SIR available that also works with multivariate responses? Thanks for help! __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz PGP KeyID: 0xE89B3462 (http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] help in lme
Since Spencer Graves already answered the factorial questions, I'll try to answer one of the other two: On Monday 15 December 2003 05:17, [EMAIL PROTECTED] wrote: To anyone who can help, Intelligent question (1) I keep on trying to fit a linear mixed model in R using 'lme(y~fxd.dsgn, data = data.mtrx, ~rnd.dsgn|group)' where fxd.dsgn and rnd.dsgn are the fixed and random design matrices, respectively. The function won't work, though. It keeps telling me that it can't find the object 'rnd.dsgn'.What's the matter here? Is rnd.dsgn a variable in data.mtrx? That is how I always fit lme models, and never encountered the problem you describe. R. P.S. Stupid question # 2 I think has been asked (and answered) several times in this list in the past. Any help would be greatly appreciated. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz PGP KeyID: 0xE89B3462 (http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] typeIII SS for lme?
Dear Bill, You can obtain marginal tests using anova(your.lme.object, type = marginal) (If you are going to compare output, note that marginal tests when using non-orthogonal contrasts ---SAS and treatment--- might give you unexpected results, last time I checked). R. On Thursday 11 December 2003 19:40, Bill Shipley wrote: To avoid angry replies, let me first say that I know that the use of Type III sums of squares is controversial, and that some statisticians recommend instead that significance be judged using the non-marginal terms in the ANOVA. However, given that type III SS is also demanded by some is there a function (equivalent to drop1 for lm) to obtain type III sums of squares for mixed models using the lme function? Bill Shipley Associate Editor, Ecology North American Editor, Annals of Botany Département de biologie, Université de Sherbrooke, Sherbrooke (Québec) J1K 2R1 CANADA [EMAIL PROTECTED] http://callisto.si.usherb.ca:8080/bshipley/ http://callisto.si.usherb.ca:8080/bshipley/ [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz PGP KeyID: 0xE89B3462 (http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] documentation typo in coxph?
Dear All, I think there is a typo in the documentation for coxph (library survival). The help says: eps: convergence threshold. Iteration will continue until the relative change in the log-likelihood is less than eps. Default is .0001. However, if I do coxph.control() I get: coxph.control() $eps [1] 1e-09 So the actual eps being used is not 10-4 but 10-9. Best, Ramón version _ platform i386-pc-linux-gnu arch i386 os linux-gnu system i386, linux-gnu status major1 minor7.1 year 2003 month06 day 16 language R -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] simplifying randomForest(s)
Dear All, I have been using the randomForest package for a couple of difficult prediction problems (which also share p n). The performance is good, but since all the variables in the data set are used, interpretation of what is going on is not easy, even after looking at variable importance as produced by the randomForest run. I have tried a simple variable selection scheme, and it does seem to perform well (as judged by leave-one-out) but I am not sure if it makes any sense. The idea is, in a kind of backwards elimination, to eliminate one by one the variables with smallest importance (or all the ones with negative importance in one go) until the out-of-bag estimate of classification error becames larger than that of the previous model (or of the initial model). So nothing really new. But I haven't been able to find any comments in the literature about simplification of random forests. Any suggestions/comments? Best, Ramón -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] simplifying randomForest(s)
Dear Andy, Thanks a lot for your message. This is quite a hazardous game. We've been burned by this ourselves. I'll send you a paper we submitted on variable selection for random forest off-line. (Those who are interested, let me know.) Thanks! The basic problem is that when you select important variables by RF and then re-run RF with those variables, the OOB error rate become biased downward. As you iterate more times, the overfitting becomes more and more severe (in the sense that, the OOB error rate will keep decreasing while error rate on an independent test set will be flat or increases). I was naïve enough to ask Breiman about this, and his reply was something like any competent statistician would know that you need something like cross-validation to do that... Yes, I understand the points you are making. However, I have tried to achieve protection against this problem by assessing the leave-one-out cross-validation error (LOOCVE) of the complete selection process. And the LOOCVE suggests this is working. Within the variable selection routine the OOB error rate is biased, but I guess that does not concern me that much, because I only use it to guide the selection. However, my final estimate of error comes from the LOOCVE. This is the esqueleton of the alorithm: n - length(y) for(i in 1:n) { the.simple.rf - simplify.the.rf(data = data[-i, ]) prediction[i] - predict(the.simple.rf, newdata = data[i, ]) } loocve - sum(y != prediction) / n Thus, the LOOCVE is computed with observations that were never used for the simplification of the tree that is predicting them. [I'll be glad to send my code to anyone interested]. And, the interesting thing with the data set I have tried is that it seems to perform reasonably (actually, the LOOCVE of a tree with the reduced set of variables is smaller than the LOOCVE of the original tree). (This is a first shot. I have a small sample size (29) so LOOCV is not that bad in terms of computation, although I am aware it can have high variance. I guess I could try the .632+ bootstrap method). Best, Ramón Best, Andy Any suggestions/comments? Best, Ramón -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help --- --- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (Whitehouse Station, New Jersey, USA), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. --- --- -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] coxph.control
Dear Gareth, ?coxph.control (which we are told to check from ?coxph) contains an argument for maxiter. Best, R. On Tuesday 26 August 2003 13:51, Gareth Hughes wrote: How can I specify the maximum number of iterations in coxph whilst also specifying my model?? I can't find any on-line examples. Thanks __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help