from:"Ramon Diaz\-Uriarte"

Re: [R] simultaneous computing

2007-06-11 Thread Ramon Diaz-Uriarte

Dear Markus,

You might want to check Rmpi, papply, snow, rpvm, and nws.

Best,

R.

On 6/11/07, Markus Schmidberger [EMAIL PROTECTED] wrote:
 Hello,

 which possibilities are available in R for simultaneous or parallel
 computing?
 I only could find biopara
 (http://cran.r-project.org/src/contrib/Descriptions/biopara.html)

 Are there other possibilities?
 Are there special groups working on simultaneous computing with R?

 Thanks
 Markus

 --
 Dipl.-Tech. Math. Markus Schmidberger

 Ludwig-Maximilians-Universität München
 IBE - Institut für medizinische Informationsverarbeitung,
 Biometrie und Epidemiologie

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] trouble with snow and Rmpi

2007-05-25 Thread Ramon Diaz-Uriarte

Dear Erin,

What operating system are you trying this on? Windows? In Linux you
definitely don't need MPICH2 but, rather, LAM/MPI.

Best,

R.

On 5/25/07, Erin Hodgess [EMAIL PROTECTED] wrote:
 Dear R People:

 I am having some trouble with the snow package.

 It requires MPICH2 and Rmpi.

 Rmpi is fine.  However, I downloaded the MPICH2 package, and installed.

 There is no mpicc, mpirun, etc.

 Does anyone have any suggestions, please?

 Thanks in advance!

 Sincerely,
 Erin Hodgess
 Associate Professor
 Department of Computer and Mathematical Sciences
 University of Houston - Downtown
 mailto: [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rserve and R to R communication

2007-04-11 Thread Ramon Diaz-Uriarte

On 4/11/07, AJ Rossini [EMAIL PROTECTED] wrote:
 On Tuesday 10 April 2007 23:17, Ramon Diaz-Uriarte wrote:

  Of course, you are right there. I think that might still be the case.
  At the time we made our decision, and decided to go for MPI, MPI 2 was
  already out, and MPI seemed more like the current/future standard
  than PVM.

 That's always been the case.  In fact MPI is a standard, where as PVM always
 was an implementation defining a so-called standard.



Ooops, you are right. But in addition to whether or not a standard, it
seemed (and still seems) that MPI is the current/future stuff
whereas PVM seemed more like a useful but aging approach. (I am aging
too, so maybe that ain't that good an argument :-).


  So using papply with Rmpi requires sharper programmers than using
  snow? Hey, it is good to know I am that smarter. I'll wear that as a
  badge :-).

 You are!   I've never been patient enough to use plain Rmpi or rpvm except a
 few times, but for me, the advantage of snow is that you get all the

Oh, but except for a few very simple things such as broadcasting data
or functions to all the slaves, or cleaning up, I never use Rmpi
directly. I always use papply, which is, really, a piece of cake.

I am just scratching the surface of this parallelism stuff, and I am
sticking to the simple embarrasingly parallelizable problems
(cross-validation, bootstrap, identical analysis on many samples,
etc). So going any deeper into MPI (individual sends, receives, etc)
was more trouble than it seemed worth. papply or, alternatively,
clusterApplyLB, are all I've (almost ever) needed/used.


 backends, not just MPI.  In fact, I've heard mention that some folks are
 sticking together a NWS backend as well.


  Anyway, papply (with Rmpi) is not, in my experience, any harder than
  snow (with either rpvm or Rmpi). In fact, I find papply a lot simpler
  than snow (clusterApply and clusterApplyLB). For one thing, debugging
  is very simple, since papply becomes lapply if no lam universe is
  booted.

 In fact it might be easier, since we never put together decent aggregation
 routines.

 (smarter doesn't mean works harder, just more intelligently :-).


I'll take that as a compliment :-).


  I see, though, that I might want to check PVM just for the sake of the
  fault tolerance in snowFT.

 Fault tolerance is one of those very ill-defined words.   Specifically:

 #1 - mapping pRNG streams to work units, not just CPUs or dispatch order (both
 of which can differ), for reproducibility

 #2 - handling failure to complete on worker nodes gracefully.

 However, you'd need checkpointing or probably a miracle to handle failure on
 the master...


Aha, I hadn't thought of #1, beings as I am much more concerned about
#2. (For #1, and to check results, I tend to run things under
controlled conditions, where if a worker shuts down, I'll bring it
back to life, and start again ---not elegant, but this happens rarely
enough that I don't worry too much).

Right now, I am dealing with #2 via additional external scripts that
check that LAM universes are up, examine log files for signs of
failures, modify lamb host definition files if needed, restart LAM
universes,  etc, and with checkpointing in the R code.  But I think it
is an ugly kludge (and a pain). I envy the Erlang guys.

As for failure in the master ... I'll take that as an act of god, so
no point in trying to defeat it via miracles :-). Actually, the
scripts above could be distributed (the checkpointing is done from the
master), so this is doable via a meta script that runs distributed.
I've just added that to the to-do list.


Best,


R.

 best,
 -tony

 [EMAIL PROTECTED]
 Muttenz, Switzerland.
 Commit early,commit often, and commit in a repository from which we can
 easily
 roll-back your mistakes (AJR, 4Jan05).




-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rserve and R to R communication

2007-04-10 Thread Ramon Diaz-Uriarte

On 4/10/07, AJ Rossini [EMAIL PROTECTED] wrote:
 On Monday 09 April 2007 23:02, Ramon Diaz-Uriarte wrote:

  (Yes, maybe I should check snowFT, but it uses PVM, and I recall a
  while back there was a reason why we decided to go with MPI instead of
  PVM).

 There is no reason that you can't run both MPI and PVM on the same cluster.


Yes, sure. We actually did that for a while. But we eventually settled on MPI.

 There is a particular reason that the first implementation we (Na Li, who did
 most of the work, and myself) made used PVM -- at the time (pre MPI 2) it was
 far more advanced than MPI as far as interactive parallel computing, i.e.
 dispatch parallel functions interactively from the command line, creating and
 manipulating virtual machines on the fly.


Of course, you are right there. I think that might still be the case.
At the time we made our decision, and decided to go for MPI, MPI 2 was
already out, and MPI seemed more like the current/future standard
than PVM. A feeling that was reinforced by seeing some key people of
PVM (e.g., Dongarra) also involved in MPI, as well as very active
development of MPI (e.g., LAM, mpich, and later OpenMPI). And MPI
seemed more like the usual message passing (which for us was, at
that time at least, a good thing). And we were also using MPI in C++
code. So we decided to bet on MPI.


 Of course, most MPI implementations will save you loads of deci-seconds on
 transfer of medium size messages over the wire, but we weren't interested in

Oh, but those deci-seconds were never the reason we decided to choose
MPI. We are using R after all, not HPF :-).

 that particular aspect, more in saving days over the course of a one-off
 program (i.e. development time, which can be more painful that run-time).


Right. And of course we never thought MPI would cost us significantly
more development time than PVM (and that the increased development
time would be compensated by the above mentioned deci-seconds).
Moreover, most of these are not one-off programs, but web applications
(some of which have been running for over two years) where easy
debugging is crucial for us if we have to revisit the code 6 months
later (and for that we found papply quite more useful than snow
---more below).


 Now, PVM had the necessary tools for fault tolerance -- though I thought that
 the recent MPI and newer message passing frameworks might have had some of
 that implemented.


Some MPIs have been developed that incorporate it. But I do not think
that is easy with LAM/MPI nor via Rmpi. The problem is that once a
node goes down, the whole LAM universe gets screwed up.

 And remember, the point of snow was to provide platform-independent parallel
 code (for which it was the first, for nearly any language/implementation),
 not to run it like a bat-out-of-hell...  (we assumed it would be cheaper to
 buy more machines than to spend a few months finding a budget along with
 sharp programmers).


So using papply with Rmpi requires sharper programmers than using
snow? Hey, it is good to know I am that smarter. I'll wear that as a
badge :-).

Anyway, papply (with Rmpi) is not, in my experience, any harder than
snow (with either rpvm or Rmpi). In fact, I find papply a lot simpler
than snow (clusterApply and clusterApplyLB). For one thing, debugging
is very simple, since papply becomes lapply if no lam universe is
booted.


I see, though, that I might want to check PVM just for the sake of the
fault tolerance in snowFT.


Best,

R.

 best,
 -tony

 [EMAIL PROTECTED]
 Muttenz, Switzerland.
 Commit early,commit often, and commit in a repository from which we can
 easily
 roll-back your mistakes (AJR, 4Jan05).




-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rserve and R to R communication

2007-04-09 Thread Ramon Diaz-Uriarte

On 4/9/07, Simon Urbanek [EMAIL PROTECTED] wrote:

 On Apr 7, 2007, at 10:56 AM, Ramon Diaz-Uriarte wrote:

  Dear All,
 
  The clients.txt file of the latest Rserve package, by Simon
  Urbanek, says, regarding its R client,
 
  (...) a simple R client, i.e. it allows you to connect to Rserve
  from R itself. It is very simple and limited,  because Rserve was
  not primarily meant for R-to-R communication (there are better ways
  to do that), but it is useful for quick interactive connection to
  an Rserve farm.
 
  Which are those better ways to do it? I am thinking about using
  Rserve to have an R process send jobs to a bunch of Rserves in
  different machines. It is like what we could do with Rmpi (or pvm),
  but without the MPI layer. Therefore, presumably it'd be easier to
  deal with network problems, machine's failures, using checkpoints,
  etc. (i.e., to try to get better fault tolerance).
 
  It seems that Rserve would provide the basic infrastructure for
  doing that and saves me from reinventing the wheel of using
  sockets, etc, directly from R.
 
  However, Simon's comment about better ways of R-to-R communication
  made me wonder if this idea really makes sense. What is the catch?
  Have other people tried similar approaches?
 

 I was commenting on direct R-to-R communication using sockets +
 'serialize' in R or the 'snow' package for parallel processing. The
 latter could be useful for what you have in mind, because it includes
 a socket-based implementation which allows you to spawn multiple
 children (across multiple machines) and collect their results. It
 uses regular rsh or ssh to start the jobs, so if can use that, it
 should work for you. 'snow' also has PVM and MPI implementations, the
 PVM one is really easy to setup (on unix) and that was what I was
 using for parallel computing in R on a cluster.


I think I now understand your comments. I've used snow and Rmpi quite
a bit. But the problem with Rmpi (or, rather, MPI) is the lack of
fault tolerance: if a node goes down, the whole MPI universe breaks,
and thus the complete set of slaves. Setting up some kind of
fault-tolerant scheme with Rserve seemed possible/simpler (as it does
not depend on the MPI layer).

(Yes, maybe I should check snowFT, but it uses PVM, and I recall a
while back there was a reason why we decided to go with MPI instead of
PVM).


 Rserve is sort of comparable, but in addition it provides the
 spawning infrastructure due to its client/server concept. What it
 doesn't have is the convenience functions that snow provides like
 clusterApply etc. Thinking of it, it would be actually possible to
 add them, although I admit that the original goal of Rserve was not
 parallel computing :). The idea was to have one Rserve server and
 multiple clients

Aha. I should have seen that. I think I understand the differences better now.

 whereas in 'snow' you sort of have one client and
 multiple servers. You could spawn multiple Rserves on multiple
 machines, but Rserve itself doesn't provide any load-balancing out of
 the box, so you'd have to do that yourself.


Yes, sure. I think that should be doable, though, if I decide to try
to go down this route.


 I don't know if that helps... :)


It does help! Thanks a lot.

Best,

R.
 Cheers,
 Simon






-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rserve and R to R communication

2007-04-09 Thread Ramon Diaz-Uriarte

On 4/9/07, Paul Gilbert [EMAIL PROTECTED] wrote:
 Matthew Keller wrote:
  Hi Ramon,
 
  I've been interested in responses to your question. I have what I
  think is a similar issue - I have a very large simulation script and
  would like to be able to modularize it by having a main script that
  calls lots of subscripts -
 For simulations you need to worry about the random number generator
 sequence.  I think snow has a scheme for handling this. If you devise
 your own system then be sure to look after this (non-trivial) detail.


Aaaargh, you are right, of course. Rmpi does have it too. I'll recheck
the rlecuyer and rsprng  packages (both can be used from Rmpi and,
IIRC, from snow).

Thanks for pointing this out!

Best,

R.




 Paul Gilbert
  but I haven't done that yet because the
  only way I could think to do it was to call a subscript, have it run,
  save the objects from the subscript, and then call those objects back
  into the main script, which seems like a very slow and onerous way to
  do it.
 
  Would Rserve do what I'm looking for?
 
  On 4/7/07, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote:
 
  Dear All,
 
  The clients.txt file of the latest Rserve package, by Simon Urbanek,
  says, regarding its R client,
 
  (...) a simple R client, i.e. it allows you to connect to Rserve from
  R itself. It is very simple and limited,  because Rserve was not
  primarily meant for R-to-R communication (there are better ways to do
  that), but it is useful for quick interactive connection to an Rserve
  farm.
 
  Which are those better ways to do it? I am thinking about using Rserve
  to have an R process send jobs to a bunch of Rserves in different
  machines. It is like what we could do with Rmpi (or pvm), but without
  the MPI layer. Therefore, presumably it'd be easier to deal with
  network problems, machine's failures, using checkpoints, etc. (i.e.,
  to try to get better fault tolerance).
 
  It seems that Rserve would provide the basic infrastructure for doing
  that and saves me from reinventing the wheel of using sockets, etc,
  directly from R.
 
  However, Simon's comment about better ways of R-to-R communication
  made me wonder if this idea really makes sense. What is the catch?
  Have other people tried similar approaches?
 
  Thanks,
 
  R.
 
  --
  Ramon Diaz-Uriarte
  Statistical Computing Team
  Structural Biology and Biocomputing Programme
  Spanish National Cancer Centre (CNIO)
  http://ligarto.org/rdiaz
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 
 

 La version française suit le texte anglais.

 

 This email may contain privileged and/or confidential inform...{{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rserve and R to R communication

2007-04-09 Thread Ramon Diaz-Uriarte

Dear Matthew,

On 4/9/07, Matthew Keller [EMAIL PROTECTED] wrote:
 Hi Ramon,

 I've been interested in responses to your question. I have what I
 think is a similar issue - I have a very large simulation script and
 would like to be able to modularize it by having a main script that
 calls lots of subscripts - but I haven't done that yet because the
 only way I could think to do it was to call a subscript, have it run,
 save the objects from the subscript, and then call those objects back
 into the main script, which seems like a very slow and onerous way to
 do it.

 Would Rserve do what I'm looking for?


Maybe. That is in fact what I am wondering. However, an easier route
might be to try Rmpi with papply. Or snow (with either Rmpi or rpvm).
Or nws (a Linda implementation for R). Using Rmpi with papply, in
particular, is a piece of cake with embarrasingly parallel problems.
papply is like lapply, but parallelized, with built-in load-balancing,
although it will run sequentially when no MPI universe is available;
the later is very handy for debugging. snow also has parallelized,
load-balanced, versions of apply (though I do not think it
automatically switches to running sequentially).

All of these (Rmpi, papply, Rmpi, rpvm, nws) are R packages available
from CRAN. You will need some additional stuff (LAM/MPI for Rmpi ---or
mpich if you run windows---, PVM for rpvm, and Python and twisted for
nws).

(I asked about Rserve because the lack of fault tolerance of MPI is a
pain to deal with in my applications. Also, with LAM/MPI there are
limits on the number of slaves that can be handled by a lam daemon,
and that is a problem for some of our web-based applications. Thus, I
am looking at alternative approaches that might eliminate some of the
extra layers that MPI ---or PVM--- add. ).

HTH,

R.


 On 4/7/07, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote:
  Dear All,
 
  The clients.txt file of the latest Rserve package, by Simon Urbanek,
  says, regarding its R client,
 
  (...) a simple R client, i.e. it allows you to connect to Rserve from
  R itself. It is very simple and limited,  because Rserve was not
  primarily meant for R-to-R communication (there are better ways to do
  that), but it is useful for quick interactive connection to an Rserve
  farm.
 
  Which are those better ways to do it? I am thinking about using Rserve
  to have an R process send jobs to a bunch of Rserves in different
  machines. It is like what we could do with Rmpi (or pvm), but without
  the MPI layer. Therefore, presumably it'd be easier to deal with
  network problems, machine's failures, using checkpoints, etc. (i.e.,
  to try to get better fault tolerance).
 
  It seems that Rserve would provide the basic infrastructure for doing
  that and saves me from reinventing the wheel of using sockets, etc,
  directly from R.
 
  However, Simon's comment about better ways of R-to-R communication
  made me wonder if this idea really makes sense. What is the catch?
  Have other people tried similar approaches?
 
  Thanks,
 
  R.
 
  --
  Ramon Diaz-Uriarte
  Statistical Computing Team
  Structural Biology and Biocomputing Programme
  Spanish National Cancer Centre (CNIO)
  http://ligarto.org/rdiaz
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


 --
 Matthew C Keller
 Postdoctoral Fellow
 Virginia Institute for Psychiatric and Behavioral Genetics



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rserve and R to R communication

2007-04-09 Thread Ramon Diaz-Uriarte

On 4/9/07, Gregory Warnes [EMAIL PROTECTED] wrote:

 You may find it easier to use NetWorkSpaces for R (see http://nws-
 r.sourceforge.net/), which provides a simple mechanism for sending
 tasks to worker R processes and collect the results back when done.

 -G

Thanks, Greg. Yes, I am actually playing around with nws too.

Best,

R.





 On Apr 9, 2007, at 12:08PM , Matthew Keller wrote:

  Hi Ramon,
 
  I've been interested in responses to your question. I have what I
  think is a similar issue - I have a very large simulation script and
  would like to be able to modularize it by having a main script that
  calls lots of subscripts - but I haven't done that yet because the
  only way I could think to do it was to call a subscript, have it run,
  save the objects from the subscript, and then call those objects back
  into the main script, which seems like a very slow and onerous way to
  do it.
 
  Would Rserve do what I'm looking for?
 
  On 4/7/07, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote:
  Dear All,
 
  The clients.txt file of the latest Rserve package, by Simon
  Urbanek,
  says, regarding its R client,
 
  (...) a simple R client, i.e. it allows you to connect to Rserve
  from
  R itself. It is very simple and limited,  because Rserve was not
  primarily meant for R-to-R communication (there are better ways to do
  that), but it is useful for quick interactive connection to an Rserve
  farm.
 
  Which are those better ways to do it? I am thinking about using
  Rserve
  to have an R process send jobs to a bunch of Rserves in different
  machines. It is like what we could do with Rmpi (or pvm), but without
  the MPI layer. Therefore, presumably it'd be easier to deal with
  network problems, machine's failures, using checkpoints, etc. (i.e.,
  to try to get better fault tolerance).
 
  It seems that Rserve would provide the basic infrastructure for doing
  that and saves me from reinventing the wheel of using sockets, etc,
  directly from R.
 
  However, Simon's comment about better ways of R-to-R communication
  made me wonder if this idea really makes sense. What is the catch?
  Have other people tried similar approaches?
 
  Thanks,
 
  R.
 
  --
  Ramon Diaz-Uriarte
  Statistical Computing Team
  Structural Biology and Biocomputing Programme
  Spanish National Cancer Centre (CNIO)
  http://ligarto.org/rdiaz
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  --
  Matthew C Keller
  Postdoctoral Fellow
  Virginia Institute for Psychiatric and Behavioral Genetics
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.




-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rserve and R to R communication

2007-04-07 Thread Ramon Diaz-Uriarte

Dear All,

The clients.txt file of the latest Rserve package, by Simon Urbanek,
says, regarding its R client,

(...) a simple R client, i.e. it allows you to connect to Rserve from
R itself. It is very simple and limited,  because Rserve was not
primarily meant for R-to-R communication (there are better ways to do
that), but it is useful for quick interactive connection to an Rserve
farm.

Which are those better ways to do it? I am thinking about using Rserve
to have an R process send jobs to a bunch of Rserves in different
machines. It is like what we could do with Rmpi (or pvm), but without
the MPI layer. Therefore, presumably it'd be easier to deal with
network problems, machine's failures, using checkpoints, etc. (i.e.,
to try to get better fault tolerance).

It seems that Rserve would provide the basic infrastructure for doing
that and saves me from reinventing the wheel of using sockets, etc,
directly from R.

However, Simon's comment about better ways of R-to-R communication
made me wonder if this idea really makes sense. What is the catch?
Have other people tried similar approaches?

Thanks,

R.

-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reasons to Use R

2007-04-06 Thread Ramon Diaz-Uriarte

Dear Lorenzo,

I'll try not to repeat what other have answered before.

On 4/5/07, Lorenzo Isella [EMAIL PROTECTED] wrote:
 The institute I work for is organizing an internal workshop for High
 Performance Computing (HPC).
(...)

 (1)Institutions (not only academia) using R

You can count my institution too. Several groups. (I can provide more
details off-list if you want).

 (2)Hardware requirements, possibly benchmarks
 (3)R  clusters, R  multiple CPU machines, R performance on different 
 hardware.

We do use R in commodity off-the shelf clusters; our two clusters are
running Debian GNU/Linux; both 32-bit machines ---Xeons--- and 64-bit
machines ---dual-core AMD Opterons. We use parallelization quite a
bit, with MPI (via Rmpi and papply packages mainly). One convenient
feature is that (once the lam universe is up and running) whether we
are using the 4 cores in a single box, or the max available 120, is
completeley transparent. Using R and MPI is, really, a piece of cake.
That said, there are things that I miss; in particular, oftentimes I
wish R were Erlang or Oz because of the straightforward fault-tolerant
distributed computing and the built-in abstractions for distribution
and concurrency. The issue of multithreading has come up several times
in this list and is something that some people miss.

I am not sure how much R is used in the usual HPC realms. It is my
understanding that the traditional HPC is still dominated by things
such as HPF, and C with MPI, OpenMP, or UPC or Cilk. The usual answer
to but R is too slow is but you can write Fortran or C code for the
bottlenecks and call it from R. I guess you could use, say, UPC in
that C that is linked to R, but I have no experience. And I think this
code can become a pain to write and maintain (specially if you want to
play around with what you try to parallelize, etc). My feeling (based
on no information or documentation whatsoever) is that how far R can
be stretched or extended into HPC is still an open question.


 (4)finally, a list of the advantages for using R over commercial
 statistical packages. The money-saving in itself is not a reason good
 enough and some people are scared by the lack of professional support,
 though this mailing list is simply wonderful.


(In addition to all the already mentioned answers)
Complete source code availability. Being able to look at the C source
code for a few things has been invaluable for me.
And, of course, and extremely active, responsive, and vibrant
community that, among other things, has contributed packages and code
for an incredible range of problems.


Best,

R.

P.S. I'd be interested in hearing about the responses you get to your
presentation.


 Kind Regards

 Lorenzo Isella

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] If you had just one book on R to buy...

2007-02-26 Thread Ramon Diaz-Uriarte

On 2/25/07, Julien Barnier [EMAIL PROTECTED] wrote:
 Hi,

 I am starting a new job as a study analyst for a social science
 research unit. I would really like to use R as my main tool for data
 manipulation and analysis. So I'd like to ask you, if you had just one
 book on R to buy (or to keep), which one would it be ? I already
 bought the Handbook of Statistical Analysis Using R, but I'd like to
 have something more complete, both on the statistical point of view
 and on R usage.

 I thought that Modern applied statistics with S-Plus would be a good
 choice, but maybe some of you could have interesting suggestions ?



Dear Julien,

I'd definitely go for MASS if you already have Handbook. MASS is an
awesome book, but you did not tell us anything about your background
(stats begginners, for instance, sometimes get lost in MASS, because
that is not the target audience). In terms of books of this level,
MASS is unique. (There are more specific books for certain topics,
such as mixed models, etc; but for a wide coverage, I'd go with MASS).

HTH,

R.



 Thanks in advance,

 --
 Julien

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R book advice

2007-02-16 Thread Ramon Diaz-Uriarte

Dear Paul,

You might want to add Everitt  Hothorn's  A Handbook of Statistical
Analyses Using R. If I had to recommend just one book it'd be this
one.

My own (i.e., highly subjective) suggestion, if you can afford two
books, would be to first go through Dalgaard's and then through
Everitt  Hothorn's.

I do not have a direct experience with Verzani's, but I've heard great
things about it. I think a pdf of a preliminary version is available
from the R page. Regarding Crawley's ... well, I find some/many of his
comments and suggestions unorthodox (my experience is with his
Statistical Computing: An Introduction to Data Analysis using
S-Plus, a book I would not recommend to a novice).

HTH,

R.



On 2/16/07, Paul Lynch [EMAIL PROTECTED] wrote:
 I'm looking for a book for someone completely ignorant of statistics
 who wishes to learn both statistics and R.  I've found three
 possibilities, one by Verzani (Using R for Introductory Statistics),
 one by Crawley (Statistics: An Introduction using R), and one by
 Dalgaard (Introductory Statistics with R).  Do these books have
 different emphases, perspectives, or strengths?  Should I just pick
 one at random and buy it?

 Thanks,
 --Paul

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Snow vs Rmpi

2007-02-15 Thread Ramon Diaz-Uriarte

Dear Vadim,

On 2/14/07, Vadim Ogranovich [EMAIL PROTECTED] wrote:
 Hi,

 I have few high-level questions about the Snow and Rmpi packages . I 
 understand that Snow uses Rmpi as one of possible transport layers, yet my 
 questions about user experience, not technical details:

 1. Does Snow install and work well in Windows?
 2. Interruptibility. I understand that currently it is impossible to 
 interrupt a running top-level command in Snow ( Ctl-c or the likes), the only 
 way to kill slave processes is to kill the master R process. Is this 
 accurate? What about Rmpi ? Is there any difference between Windows and Linux?


I've never used any of those under Windoze. I think your statement is
accurate under Linux. (In fact, I often get rid of any of those Rmpis
gone astray by issuing a lamhalt and/or lamwipe).

 3. When the master process dies , is it guaranteed that the slaves will die 
 too? How reliable is this (I've seen some applications, not related to R, 
 that were flaky about killing slaves)


If you use an orderly exit procedure (mpi.close.Rslaves(); mpi.quit())
I've never, ever, seen badly behaved Rmpi slaves. But I've seen them
under strange circumstances (I think network problems that messed up
the lam universe ?).

A kind of fail proof approach, if you can afford it, is to use
different lam universes (using the LAM_MPI_SESSION_SUFFIX) for
different simultaneous runs. Then, if one particular run behaves
poorly, you can issue a lamhalt/lamwipe for just that LAM universe.

A final suggestion: you might want to take a look at the papply
package, which does load-balancing and allows you to run sequential
(if there is no lam universe), and thus makes debugging much simpler.

R.



 Thank you very much for your help,
 Vadim



 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Snow Package and R: Exported Variable Problem

2007-02-03 Thread Ramon Diaz-Uriarte

Dear Robert,


On 2/2/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Hello and thanks in advance for your time.

 I've created a simulation on my cluster which uses a custom package
 developed by me for different functions and also the snow package.
 Right now I'm using LAM to communicate between nodes and am currently
 only testing my code on 3 nodes for simplicity, though I plan on
 expanding to 16 later.  My problem is this error:

 Error in fn(par, ..) : object \x1\ not found \n
 attr(,class)
 try-error

 In my simulation I need to run a function several times with an
 different variable each time.  All the invocations on the functions
 are independent of the others.  I start the simulation on one node,
 create a cluster of several nodes, load my custom package and snow on
 all of them, use  clusterExport(cl, x1) to export the variable
 x1(among other variable I need), then I call my simulation on the
 cluster using clusterApplyLB(cl, 2:S, simClust)  where cl is the
 cluster and S is a constant defined above as 500.  Using print
 statements (since snow, or R for that matter, has next to no ability
 to debug) I found that the error cropped up in this statement:

 theta6 = optim(c(0,0,0,0,0,0,.2), loglikelihood, scrore6, method =
 CG, control=list(fnscale=-1,reltol=1e-8,maxit=2000))$par

 Both the functions loglikelelihood and score6 use x1, but I know that
 it is getting exported to the node correctly since it gets assigned
 earlier in the simulation:

 x1 = rep(0,n1)

 The error I stated above happens fo every itteration of the simulation
 (499 times) and I'm really at a loss as to why its happening and what
 I can do to determine what it is.  I'm wondering at this point if
 exporting the variable makes it unavailable to certain other packages,
 though that doesn't really make any sense.


From reading quickly through your description, I do not see anything
obviously wrong.


 If anyone can help me with this problem, or let me know how I can
 debug this, or even a clue as to why it might be happening I would
 greatly appreciate it.  I've been wrestling with this for some time
 and no online documentation can help.  Thank you for your time and help.


When I was feeling really lost, I've resorted to assigning
intermediate output from commands such as ls, search, etc, to
variables (i.e., something like this.ls - ls() from inside you
function call, e.g., simClust) and then, e.g., from mpi.remote.exec,
looking at the value of those variables.

And, for over a year now, I've been doing most of my MPI stuff with
papply; the one nice thing of papply is that, if you have no LAM/MPI
universe, it will use a serial (not a parallel) version, so it is
much, much, much easier to debug, because you see the warnings and the
errors. So most of the frustration of things like launching something
and seeing it never return, etc, is gone.



Best,

R.

 Just so you know I'm a Computer Scientist not a Statistician, though I
 will be able to give any information about the statistics involved in
 this program.  I am reluctant to give away all source code since it is
 not my work but rather code I'm converting from standard code to
 parallelized code for a professor of mine.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Does R support grid or parallel computing?

2007-01-29 Thread Ramon Diaz-Uriarte

Dear Xiaopeng,

There is certainly support for, among others, MPI and PVM; check
packages Rmpi, rpvm, snow, and papply, in CRAN.


Best,

R.

On 1/29/07, xiaopeng hu [EMAIL PROTECTED] wrote:
 Does R support grid or parallel computing?

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package for phylogenetic tree analyses

2007-01-27 Thread Ramon Diaz-Uriarte

Dear Lalitha,

On 1/26/07, lalitha viswanath [EMAIL PROTECTED] wrote:
 Hi
 I am looking for a package that
 1. reads in a phylogenetic tree in NEXUS format
 2. given two members/nodes on the tree, can return the
 distance between the two using the tree.

 I came across the following packages on CRAN
 ouch, ape, apTreeShape, phylgr all of which seem to
 provide extensive range of functions for reading in a
 Nexus-format tree and performing phylogenetic
 analyses, tree comparisons etc, but none to the best
 of my undestanding seem to provide a function obtain
 distances (in terms of branch lengths) between two
 nodes on a single tree.
 I am working with just one tree and need a function to
 return distances between various pairs of nodes on the
 tree.

 Is there any other package out there that has this
 functionality?


I've been away from that area for some years now, but certainly our
phylogr package will not do what you want. However, I think there are
various external (non R) programs that will do it, and that might be
all you need if this is just a sporadic use. The set of programs
distributed and maintained by Ted Garland (PDAP) did provide the type
of output you want (in the form of a matrix of distances). I am sure
there are others out there (I bet PHYLIP does it too).

HTH,

R.



 Thanks for your responses to my earlier queries. As a
 beginning R programmer, your responses have been of
 utmost help and guidance.

 Lalitha



 

 Access over 1 million songs.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ECB/Sidebar/R (Emacs) was: Re: kate editor for R

2007-01-22 Thread Ramon Diaz-Uriarte

On 1/22/07, Dirk Eddelbuettel [EMAIL PROTECTED] wrote:

 On 22 January 2007 at 00:05, Ramon Diaz-Uriarte wrote:
 | On 1/20/07, Dirk Eddelbuettel [EMAIL PROTECTED] wrote:
 |  Just confirms my suspicion that even after all these years, I barely
 |  scratched the surface of ess.  That '2+ years' old feature wouldn't 
 happen to
 |  be documented somewhere, would it?
 |
 | Dirk, I must be missing something. All I do is: M-x ecb-activate
 | Everything works. I do nothing special with ess. For that matter, I do
 | nothing special when editing LaTeX or Python, and ecb (et al) do work
 | as intended.

 I had looked at ECB for C++ programming. It simply hadn't occurred to me that
 it would plug into ESS.

I wasn't aware of it either until I attended Tony Rossini's tutorial
at useR! 2006.


 Score another one for Emacs as an operating system.

Oh yes, and almost coffee maker and pizza deliverer :-).


R.


 Dirk

 --
 Hell, there are no rules here - we're trying to accomplish something.
   -- Thomas A. Edison



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kate editor for R

2007-01-21 Thread Ramon Diaz-Uriarte

On 1/20/07, Marc Schwartz [EMAIL PROTECTED] wrote:
 On Sat, 2007-01-20 at 11:20 +0100, Ramon Diaz-Uriarte wrote:
  On 1/20/07, Marc Schwartz [EMAIL PROTECTED] wrote:
   xft anti-aliasing is incorporated into the version 23 unicode trunk.
  
   So it looks great on a hi-res LCD panel. Without xft, even using
   Bitstream fonts, it was still pretty rough on the eyes.
  
 
 
  Humm, call me silly, but most of the time I do not like anti-aliased
  fonts: I tend to agree with
  http://modeemi.cs.tut.fi/~tuomov/ion/faq/entries/Blurred_fonts.html,
  where he says characters look like having been dragged through mud
  :-).
 
   It also fully supports GTK widgets, which is great if you are using
   GNOME, which I do.
  
 
  But my .emacs gets rid of the toolbar and scroll bars on start-up (I
  find toolbars confusing things that take up precious screen space),
  and often work without the menubar (when I am doing familiar work). I
  use ion3 (http://modeemi.cs.tut.fi/~tuomov/ion/), which, together with
  wmii (and followed, at some distance, by fmwv), I find the most usable
  window managers, and thus the look of widgets is not that relevant to
  me.
 
  So, for most practical purposes (except for resizing with the mouse) I
  use emacs as if started with the -w command. (I know, I know, this
  looks like going backwards ... must be a mid-life involution crisis
  :-).

 We'll drag you kicking and screaming into the 21st century...

 ;-)



I was afraid someone would suggest that sooner or later :-).

   xft was added as a patch to version 22, but it was not very stable.
  
   Note that version 23 is in alpha status, so use at your own risk if you
   decide to pursue this. 21 is still the current stable release version,
   but 23 has been rock solid for me.
  
   I can provide you with a shell script to build it. Let me know.
  
 
 
  Let me try with the debian packages, and if I have problems, I'll
  definitely start bugging you. Thanks a lot for your help!
 
  Best,
 
  R.

 FWIW, here are some screen shots so that you can get a feel for what it
 looks like. This is using two 1600x1200 lcd panels.

 1. Basic view of main window, showing ECB and ESS:

   http://home.comcast.net/~marc_schwartz/emacs23.png

 2. Full screen (3200x1600 using nVidia TwinView) capture to show GTK
 file selection widget:

   http://home.comcast.net/~marc_schwartz/emacs23-2.png

 3. View of main window to show the integration of SVN version control,
 which I use my all of my R code:

   http://home.comcast.net/~marc_schwartz/emacs23-3.png



Hey, those look very neat! (I'd get rid of all those toolbars :-). But
very neat. Time to try it. (But no way I am giving up ion3).



 No doubt that the use of xft is a personal choice, and some folks do not
 like it, perhaps notably on CRTs.  As I have gotten older and need
 bi-focals for computer work and reading, I find the use of xft much
 easier and I am less prone to eye strain, given how many hours I
 typically spend working each day.


Marc, but the solution for that problem are not xft fonts. The
solution is ... working less hours. (I'll blackmail my boss: if you
force me to work more hours, I'll use xft fonts. I bet it'll be a
great strategy).


Best

 HTH,

 Marc





-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ECB/Sidebar/R (Emacs) was: Re: kate editor for R

2007-01-21 Thread Ramon Diaz-Uriarte

On 1/20/07, Dirk Eddelbuettel [EMAIL PROTECTED] wrote:

 Hi Tony,

 On 20 January 2007 at 15:20, AJ Rossini wrote:
 | On Friday 19 January 2007 15:39, Dirk wrote:
 |  As I am doing more C++ work, I glanced at oo-browser, sidebar, ecb (all in
 |  Debian/Ubuntu).  Would a real Emacs hacker be able to these to R code too?
 | That functionality (though relatively minimal, i.e. ECB/sidebar support
 | through imenu) should have existed for 2+ years now, at least it does for 
 me.

 Just confirms my suspicion that even after all these years, I barely
 scratched the surface of ess.  That '2+ years' old feature wouldn't happen to
 be documented somewhere, would it?


Dirk, I must be missing something. All I do is: M-x ecb-activate
Everything works. I do nothing special with ess. For that matter, I do
nothing special when editing LaTeX or Python, and ecb (et al) do work
as intended.

Best,

R.


 Dirk

 --
 Hell, there are no rules here - we're trying to accomplish something.
   -- Thomas A. Edison

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Offtopic: emacs 23, was kate editor for R

2007-01-21 Thread Ramon Diaz-Uriarte

On 1/20/07, Peter Dalgaard [EMAIL PROTECTED] wrote:
 Ramon Diaz-Uriarte wrote:
  Hi Marc,
 
 
  Thanks a lot for the detailed explanation! I'll give it a try. (But
  still, why emacs23? what is missing in v. 21 that you get in 23?).
 
  Best,
 
  R.
 
 Ability to load files with UTF-8 characters in the name? (This is pretty
 maddening if you find yourself with such a beast.)


Aha, thanks. I try to stay away from those creatures. I guess I'll be
able to start adding a nice, good looking spanish ñ to file names
:-).

R.

 BTW, any inkling when/whether this is heading for Fedora N?



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kate editor for R

2007-01-20 Thread Ramon Diaz-Uriarte

On 1/20/07, Marc Schwartz [EMAIL PROTECTED] wrote:
 xft anti-aliasing is incorporated into the version 23 unicode trunk.

 So it looks great on a hi-res LCD panel. Without xft, even using
 Bitstream fonts, it was still pretty rough on the eyes.



Humm, call me silly, but most of the time I do not like anti-aliased
fonts: I tend to agree with
http://modeemi.cs.tut.fi/~tuomov/ion/faq/entries/Blurred_fonts.html,
where he says characters look like having been dragged through mud
:-).

 It also fully supports GTK widgets, which is great if you are using
 GNOME, which I do.


But my .emacs gets rid of the toolbar and scroll bars on start-up (I
find toolbars confusing things that take up precious screen space),
and often work without the menubar (when I am doing familiar work). I
use ion3 (http://modeemi.cs.tut.fi/~tuomov/ion/), which, together with
wmii (and followed, at some distance, by fmwv), I find the most usable
window managers, and thus the look of widgets is not that relevant to
me.

So, for most practical purposes (except for resizing with the mouse) I
use emacs as if started with the -w command. (I know, I know, this
looks like going backwards ... must be a mid-life involution crisis
:-).


 xft was added as a patch to version 22, but it was not very stable.

 Note that version 23 is in alpha status, so use at your own risk if you
 decide to pursue this. 21 is still the current stable release version,
 but 23 has been rock solid for me.

 I can provide you with a shell script to build it. Let me know.



Let me try with the debian packages, and if I have problems, I'll
definitely start bugging you. Thanks a lot for your help!

Best,

R.

 Best regards,

 Marc

 On Sat, 2007-01-20 at 03:59 +0100, Ramon Diaz-Uriarte wrote:
  Hi Marc,
 
 
  Thanks a lot for the detailed explanation! I'll give it a try. (But
  still, why emacs23? what is missing in v. 21 that you get in 23?).
 
  Best,
 
  R.
 
  On 1/19/07, Marc Schwartz [EMAIL PROTECTED] wrote:
   On Fri, 2007-01-19 at 16:09 +0100, Ramon Diaz-Uriarte wrote:
  
   snip
  
I had problems with one of the packages ecb depends upon (semantic ?), 
and
emacs-snapshot. IIRC it was a documented problem related to a bug in 
semantic
(?); maybe it's been fixed now. But what does emacs-snapshot-gtk 
provide you
now (besides the pretinness) that you'd miss with 21-4?
  
   snip
  
   Ramon,
  
   Just a quick heads up on the ECB issue.
  
   I am using Emacs 23 from CVS and had to update ECB and the associated
   packages to use this version of Emacs. I have emacs 23 installed and run
   from a separate download folder, so that I do not overwrite the
   installed stable version.
  
   I use the CEDET cedet-1.0pre3.tar.gz aggregate package from
   http://cedet.sourceforge.net/ as well as the ECB cvs snap shot package
   ecb.tar.gz from http://ecb.sourceforge.net/downloads.html.
  
   The CEDET package includes cogre, ede, eieio, semantic and speedbar.
  
   Extract these two files and then modify ~/.emacs with the following:
  
   ;; Load ECB
   (setq semantic-load-turn-everything-on t)
   (load-file /PATH/TO/CEDET/cedet-1.0pre3/common/cedet.el)
  
   (add-to-list 'load-path /PATH/TO/ECB/ecb-snap)
   (require 'ecb)
  
  
   And all seems well.
  
   HTH,
  
   Marc Schwartz
  




-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kate editor for R

2007-01-19 Thread Ramon Diaz-Uriarte

On Friday 19 January 2007 03:30, Frank E Harrell Jr wrote:
 Like kile for LaTeX, Linux/KDE's kate editor is an excellent editor for
 R, with easy code submission to a running R process.  Syntax
 highlighting is good.  I have  not been able to figure out two things:

 - how to automatically reformat a line or region of text using good
 indentation rules (Emacs/ESS make this so easy by just hitting Tab while
 the cursor is in a line, or highlighting a region and hitting Esq q)

 - how to cause auto-indenting as you type braces.  For me, kate puts a {
 in column one

 Thanks for any pointers.


Dear Frank,

May I ask why you are moving to Kate from Emacs? I tried Kate with R (and 
Python and LaTeX) and I really liked the folding (which seems a lot better 
than all the not-really-functional hacks for getting folding with R and 
Python code) and some of the function/class browsers.

However, I specially missed:

a) the possibility of opening as many R processes as I want, and placing that 
buffer in wherever place and with whichever size I want.

b) most of the rest of emacs, actually (hey, where did my shells go? and my 
org-mode buffer? and my ...; not to talk about the keybindings).

If you feel like it, I'd like to hear about your impressions.

R.


 Frank

-- 
Ramón Díaz-Uriarte
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kate editor for R

2007-01-19 Thread Ramon Diaz-Uriarte

On Friday 19 January 2007 14:12, Frank E Harrell Jr wrote:
 Ramon Diaz-Uriarte wrote:
  On Friday 19 January 2007 03:30, Frank E Harrell Jr wrote:
  Like kile for LaTeX, Linux/KDE's kate editor is an excellent editor for
  R, with easy code submission to a running R process.  Syntax
  highlighting is good.  I have  not been able to figure out two things:
 
  - how to automatically reformat a line or region of text using good
  indentation rules (Emacs/ESS make this so easy by just hitting Tab while
  the cursor is in a line, or highlighting a region and hitting Esq q)
 
  - how to cause auto-indenting as you type braces.  For me, kate puts a {
  in column one
 
  Thanks for any pointers.
 
  Dear Frank,
 
  May I ask why you are moving to Kate from Emacs? I tried Kate with R (and
  Python and LaTeX) and I really liked the folding (which seems a lot
  better than all the not-really-functional hacks for getting folding with
  R and Python code) and some of the function/class browsers.
 
  However, I specially missed:
 
  a) the possibility of opening as many R processes as I want, and placing
  that buffer in wherever place and with whichever size I want.
 
  b) most of the rest of emacs, actually (hey, where did my shells go? and
  my org-mode buffer? and my ...; not to talk about the keybindings).
 
  If you feel like it, I'd like to hear about your impressions.
 
  R.


Thanks for your reply, Frank.


 Good question Ramon.  We have dozens of R users in our department and
 many of them were not brought up on Emacs and find it hard to learn.  We
 are looking for an alternative to recommend for them.  I love Emacs
 myself and find that it is the fastest editor by a significant margin,
 and I am used to its keybindings.  But I prefer kate for printing and
 for managing multiple files in a project.  kate has a nice sidebar for
 navigating the files, and indicates which files have been changed since
 they were saved.  

Ouch, I had missed that.


 kate also schematically depicts nested code with side 
 symbols connected by vertical lines for {}.  

Yes, this feature I _really_ like. Nothing like it that I know of for emacs (I 
use fold-dwim, but I find it clunky).


 Scrolling of the R output 
 window is a little more logical in kate than in ESS.  I find myself
 having to type Esc-shift- often in ESS/Emacs to get to the bottom of
 the R output but kate puts the cursor at the bottom.  Also I get a
 little frustrated with package management in Xemacs (I know however that
 it's nice to be able to load thousands of packages) related to file
 permissions, ftp commands, anonymous logins, etc.  And from a purely
 looks standpoint kate is superior.

 I tried jedit for a bit.  jedit has a lot of nice features but also has
 problems with indenting in R.


Thanks for your feedback. I think I'll play again with kate this weekend.

Best,

R.


 Frank

  Frank

-- 
Ramón Díaz-Uriarte
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kate editor for R

2007-01-19 Thread Ramon Diaz-Uriarte

Hi Dirk,


On Friday 19 January 2007 15:39, Dirk Eddelbuettel wrote:
 Ramon, Frank,

 Great discussion. Nothing like an editor feud over morning coffee. Just
 kidding.


Not at the editor flame war stage yet (nobody mentioned vim :-).


 On 19 January 2007 at 11:18, Ramon Diaz-Uriarte wrote:
 | However, I specially missed:
 |
 | a) the possibility of opening as many R processes as I want, and placing
 | that buffer in wherever place and with whichever size I want.
 |
 | b) most of the rest of emacs, actually (hey, where did my shells go? and
 | my org-mode buffer? and my ...; not to talk about the keybindings).

 [ Thanks for the org-mode suggestion. That looks very useful. How do I get
 it to sync to my Palm, though? ;-) ]


I asked the same at the org-mode list some time back and there was a short 
thread  
(http://lists.gnu.org/archive/html/emacs-orgmode/2006-11/msg3.html). The 
bottom line is this:
a) for the general org files, you send them to the palm as text, and you edit 
them there with a suitable editor (e.g., PalmED). If org-mode files are kept 
under version control, life becomes easier.
b) dealing with calendar is a more serious problem.
c) there seems to be some (not a lot of) interest in these issues, but things 
are not smooth yet.

(I am using my Palm a lot less now, so I am no longer even doing a) 
regularly).

 On 19 January 2007 at 07:12, Frank E Harrell Jr wrote:
 [...]

 | and I am used to its keybindings.  But I prefer kate for printing and
 | for managing multiple files in a project.  kate has a nice sidebar for
 | navigating the files, and indicates which files have been changed since

 As I am doing more C++ work, I glanced at oo-browser, sidebar, ecb (all in
 Debian/Ubuntu).  Would a real Emacs hacker be able to these to R code too?


I use ecb with R directly out of the ecb box. No problem. 

 | they were saved.  kate also schematically depicts nested code with side
 | symbols connected by vertical lines for {}.  Scrolling of the R output
 | window is a little more logical in kate than in ESS.  I find myself
 | having to type Esc-shift- often in ESS/Emacs to get to the bottom of
 | the R output but kate puts the cursor at the bottom.  Also I get a
 | little frustrated with package management in Xemacs (I know however that
 | it's nice to be able to load thousands of packages) related to file
 | permissions, ftp commands, anonymous logins, etc.  And from a purely
 | looks standpoint kate is superior.

 I switched back to GNU Emacs, using the emacs-snapshot-gtk package in
 Debian and Ubuntu. Prettier, and still emacs :)   I get by without locally
 install elisp code in /usr/local -- everything I needed was apt-get'able.


I had problems with one of the packages ecb depends upon (semantic ?), and 
emacs-snapshot. IIRC it was a documented problem related to a bug in semantic 
(?); maybe it's been fixed now. But what does emacs-snapshot-gtk provide you 
now (besides the pretinness) that you'd miss with 21-4? 

R.


 Dirk

-- 
Ramón Díaz-Uriarte
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kate editor for R

2007-01-19 Thread Ramon Diaz-Uriarte

Hi Marc,


Thanks a lot for the detailed explanation! I'll give it a try. (But
still, why emacs23? what is missing in v. 21 that you get in 23?).

Best,

R.

On 1/19/07, Marc Schwartz [EMAIL PROTECTED] wrote:
 On Fri, 2007-01-19 at 16:09 +0100, Ramon Diaz-Uriarte wrote:

 snip

  I had problems with one of the packages ecb depends upon (semantic ?), and
  emacs-snapshot. IIRC it was a documented problem related to a bug in 
  semantic
  (?); maybe it's been fixed now. But what does emacs-snapshot-gtk provide you
  now (besides the pretinness) that you'd miss with 21-4?

 snip

 Ramon,

 Just a quick heads up on the ECB issue.

 I am using Emacs 23 from CVS and had to update ECB and the associated
 packages to use this version of Emacs. I have emacs 23 installed and run
 from a separate download folder, so that I do not overwrite the
 installed stable version.

 I use the CEDET cedet-1.0pre3.tar.gz aggregate package from
 http://cedet.sourceforge.net/ as well as the ECB cvs snap shot package
 ecb.tar.gz from http://ecb.sourceforge.net/downloads.html.

 The CEDET package includes cogre, ede, eieio, semantic and speedbar.

 Extract these two files and then modify ~/.emacs with the following:

 ;; Load ECB
 (setq semantic-load-turn-everything-on t)
 (load-file /PATH/TO/CEDET/cedet-1.0pre3/common/cedet.el)

 (add-to-list 'load-path /PATH/TO/ECB/ecb-snap)
 (require 'ecb)


 And all seems well.

 HTH,

 Marc Schwartz

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] eval(parse(text vs. get when accessing a function

2007-01-17 Thread Ramon Diaz-Uriarte

(I overlooked the reply).

Thanks, Gabor. That is neat and easy! (and I should have been able to
see it on my own :-(

Best,

R.

On 1/8/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 The S4 is not essential.  You could do it in S3 too:

  f.a - function(x) x+1
  f.b - function(x) x+2
  f - function(x) UseMethod(f)
 
  f(structure(10, class = a))
 [1] 11
 attr(,class)
 [1] a

 On 1/6/07, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote:
  Hi Martin,
 
 
 
  On 1/6/07, Martin Morgan [EMAIL PROTECTED] wrote:
   Hi Ramon,
  
   It seems like a naming convention (f.xxx) and eval(parse(...)) are
   standing in for objects (of class 'GeneSelector', say, representing a
   function with a particular form and doing a particular operation) and
   dispatch (a function 'geneConverter' might handle a converter of class
   'GeneSelector' one way, user supplied ad-hoc functions more carefully;
   inside geneConverter the only real concern is that the converter
   argument is in fact a callable function).
  
   eval(parse(...)) brings scoping rules to the fore as an explicit
   programming concern; here scope is implicit, but that's probably better
   -- R will get its own rules right.
  
   Martin
  
   Here's an S4 sketch:
  
   setClass(GeneSelector,
contains=function,
representation=representation(description=character),
validity=function(object) {
msg - NULL
argNames - names(formals(object))
if (argNames[1]!=x)
  msg - c(msg, \n  GeneSelector requires a first argument 
   named 'x')
if (!... %in% argNames)
  msg - c(msg, \n  GeneSelector requires '...' in its 
   signature)
if (0==length([EMAIL PROTECTED]))
  msg - c(msg, \n  Please describe your GeneSelector)
if (is.null(msg)) TRUE else msg
})
  
   setGeneric(geneConverter,
  function(converter, x, ...) standardGeneric(geneConverter),
  signature=c(converter))
  
   setMethod(geneConverter,
 signature(converter=GeneSelector),
 function(converter, x, ...) {
 ## important stuff here
 converter(x, ...)
 })
  
   setMethod(geneConverter,
 signature(converter=function),
 function(converter, x, ...) {
 message(ad-hoc converter; hope it works!)
 converter(x, ...)
 })
  
   and then...
  
c1 - new(GeneSelector,
   +   function(x, ...) prod(x, ...),
   +   description=Product of x)
   
c2 - new(GeneSelector,
   +   function(x, ...) sum(x, ...),
   +   description=Sum of x)
   
geneConverter(c1, 1:4)
   [1] 24
geneConverter(c2, 1:4)
   [1] 10
geneConverter(mean, 1:4)
   ad-hoc converter; hope it works!
   [1] 2.5
   
cvterr - new(GeneSelector, function(y) {})
   Error in validObject(.Object) : invalid class GeneSelector object: 1:
 GeneSelector requires a first argument named 'x'
   invalid class GeneSelector object: 2:
 GeneSelector requires '...' in its signature
   invalid class GeneSelector object: 3:
 Please describe your GeneSelector
xxx - 10
geneConverter(xxx, 1:4)
   Error in function (classes, fdef, mtable)  :
   unable to find an inherited method for function geneConverter, 
   for signature numeric
  
 
 
 
  Thanks!! That is actually a rather interesting alternative approach
  and I can see it also adds a lot of structure to the problem. I have
  to confess, though, that I am not a fan of OOP (nor of S4 classes); in
  this case, in particular, it seems there is a lot of scaffolding in
  the code above (the counterpoint to the structure?) and, regarding
  scoping rules, I prefer to think about them explicitly (I find it much
  simpler than inheritance).
 
  Best,
 
  R.
 
 
  
   Ramon Diaz-Uriarte [EMAIL PROTECTED] writes:
  
Dear Greg,
   
   
On 1/5/07, Greg Snow [EMAIL PROTECTED] wrote:
Ramon,
   
I prefer to use the list method for this type of thing, here are a 
couple of reasons why (maybe you are more organized than me and would 
never do some of the stupid things that I have, so these don't apply 
to you, but you can see that the general suggestion applys to some of 
the rest of us).
   
   
   
Those suggestions do apply to me of course (no claim to being
organized nor beyond idiocy here). And actually the suggestions on
this thread are being very useful. I think, though, that I was not
very clear on the context and my examples were too dumbed down. So
I'll try to give more detail (nothing here is secret, I am just trying
not to bore people).
   
The code is part of a web-based application, so there is no
interactive user. The R code is passed the arguments (and optional
user functions) from the CGI.
   
There is one core function (call it cvFunct

Re: [R] eval(parse(text vs. get when accessing a function

2007-01-07 Thread Ramon Diaz-Uriarte

Hi Martin,



On 1/6/07, Martin Morgan [EMAIL PROTECTED] wrote:
 Hi Ramon,

 It seems like a naming convention (f.xxx) and eval(parse(...)) are
 standing in for objects (of class 'GeneSelector', say, representing a
 function with a particular form and doing a particular operation) and
 dispatch (a function 'geneConverter' might handle a converter of class
 'GeneSelector' one way, user supplied ad-hoc functions more carefully;
 inside geneConverter the only real concern is that the converter
 argument is in fact a callable function).

 eval(parse(...)) brings scoping rules to the fore as an explicit
 programming concern; here scope is implicit, but that's probably better
 -- R will get its own rules right.

 Martin

 Here's an S4 sketch:

 setClass(GeneSelector,
  contains=function,
  representation=representation(description=character),
  validity=function(object) {
  msg - NULL
  argNames - names(formals(object))
  if (argNames[1]!=x)
msg - c(msg, \n  GeneSelector requires a first argument 
 named 'x')
  if (!... %in% argNames)
msg - c(msg, \n  GeneSelector requires '...' in its 
 signature)
  if (0==length([EMAIL PROTECTED]))
msg - c(msg, \n  Please describe your GeneSelector)
  if (is.null(msg)) TRUE else msg
  })

 setGeneric(geneConverter,
function(converter, x, ...) standardGeneric(geneConverter),
signature=c(converter))

 setMethod(geneConverter,
   signature(converter=GeneSelector),
   function(converter, x, ...) {
   ## important stuff here
   converter(x, ...)
   })

 setMethod(geneConverter,
   signature(converter=function),
   function(converter, x, ...) {
   message(ad-hoc converter; hope it works!)
   converter(x, ...)
   })

 and then...

  c1 - new(GeneSelector,
 +   function(x, ...) prod(x, ...),
 +   description=Product of x)
 
  c2 - new(GeneSelector,
 +   function(x, ...) sum(x, ...),
 +   description=Sum of x)
 
  geneConverter(c1, 1:4)
 [1] 24
  geneConverter(c2, 1:4)
 [1] 10
  geneConverter(mean, 1:4)
 ad-hoc converter; hope it works!
 [1] 2.5
 
  cvterr - new(GeneSelector, function(y) {})
 Error in validObject(.Object) : invalid class GeneSelector object: 1:
   GeneSelector requires a first argument named 'x'
 invalid class GeneSelector object: 2:
   GeneSelector requires '...' in its signature
 invalid class GeneSelector object: 3:
   Please describe your GeneSelector
  xxx - 10
  geneConverter(xxx, 1:4)
 Error in function (classes, fdef, mtable)  :
 unable to find an inherited method for function geneConverter, for 
 signature numeric




Thanks!! That is actually a rather interesting alternative approach
and I can see it also adds a lot of structure to the problem. I have
to confess, though, that I am not a fan of OOP (nor of S4 classes); in
this case, in particular, it seems there is a lot of scaffolding in
the code above (the counterpoint to the structure?) and, regarding
scoping rules, I prefer to think about them explicitly (I find it much
simpler than inheritance).

Best,

R.



 Ramon Diaz-Uriarte [EMAIL PROTECTED] writes:

  Dear Greg,
 
 
  On 1/5/07, Greg Snow [EMAIL PROTECTED] wrote:
  Ramon,
 
  I prefer to use the list method for this type of thing, here are a couple 
  of reasons why (maybe you are more organized than me and would never do 
  some of the stupid things that I have, so these don't apply to you, but 
  you can see that the general suggestion applys to some of the rest of us).
 
 
 
  Those suggestions do apply to me of course (no claim to being
  organized nor beyond idiocy here). And actually the suggestions on
  this thread are being very useful. I think, though, that I was not
  very clear on the context and my examples were too dumbed down. So
  I'll try to give more detail (nothing here is secret, I am just trying
  not to bore people).
 
  The code is part of a web-based application, so there is no
  interactive user. The R code is passed the arguments (and optional
  user functions) from the CGI.
 
  There is one core function (call it cvFunct) that, among other
  things, does cross-validation. So this is one way to do things:
 
  cvFunct - function(whatever, genefiltertype, whateverelse) {
internalGeneSelect - eval(parse(text = paste(geneSelect,
   genefiltertype, sep = .)))
 
## do things calling internalGeneSelect,
  }
 
  and now define all possible functions as
 
  geneSelect.Fratio - function(x, y, z) {##something}
  geneSelect.Wilcoxon - function(x, y, z) {## something else}
 
  If I want more geneSelect functions, adding them is simple. And I can
  even allow the user to pass her/his own functions, with the only
  restriction that it takes three args, x, y, z

Re: [R] eval(parse(text vs. get when accessing a function

2007-01-06 Thread Ramon Diaz-Uriarte

On 1/5/07, jim holtman [EMAIL PROTECTED] wrote:
 The other reason for considering which of the different approaches to use
 would be performance:

  f.1 - function(x) x+1
  f.2 - function(x) x+2
 
  system.time({
 + for (i in 1:10){
 + eval(parse(text=paste('f.', i%%2+1, sep='')))(i)
 + }
  + })
 [1] 6.96 0.00 8.32   NA   NA
 
  system.time({
 + for (i in 1:10){
 + {if(i %% 2 == 0) f.1 else f.2}(i)
 + }
 + })
 [1] 0.52 0.00 0.61   NA   NA
 
 

 eval(parse...) seems to be an order of magnitude slower.  It would make a
 difference if you were calling it several thousand times; so it depends on
 your application.

Yes, that is true, thanks. Note, though, that in my case I am more
likely to do the eval(parse( and pasting only once, and then call
the new function thousands of times; something more like your second
version than the first.


g - function(x, fpost) {
   calledf - eval(parse(text = paste(f., fpost, sep = )))
   calledf(x)
   ## the thousands of calls to calledf go here
}


R.



 On 1/5/07, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote:
 
  On Friday 05 January 2007 19:35, Bert Gunter wrote:
   ??
  
   Or to add to what Peter Dalgaard said... (perhaps for the case of many
 more
   functions)
  
   Why eval(parse())? What's wrong with if then?
  
   g - function(fpost,x){if(fpost==1)f.1 else f.2 }(x)
  
   or switch() if you have more than 2 possible arguments? I think your
   remarks reinforce the wisdom of Thomas's axiom .
 
  Thanks, Bert, but as with Peter's solution, your solution forces me to
 build g
  ahead of time. And again, I am not sure I see why the attempt to avoid
  eval(parse(text.
 
  Best,
 
  R.
 
 
  
   Bert Gunter
   Genentech Nonclinical Statistics
   South San Francisco, CA 94404
  
  
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf Of
 Ramon Diaz-Uriarte
   Sent: Friday, January 05, 2007 10:02 AM
   To: r-help; [EMAIL PROTECTED]
   Subject: [R] eval(parse(text vs. get when accessing a function
  
   Dear All,
  
   I've read Thomas Lumley's fortune If the answer is parse() you should
   usually
   rethink the question.. But I am not sure it that also applies (and why)
 to
   other situations (Lumley's comment
   http://tolstoy.newcastle.edu.au/R/help/05/02/12204.html
   was in reply to accessing a list).
  
   Suppose I have similarly called functions, except for a postfix. E.g.
  
   f.1 - function(x) {x + 1}
   f.2 - function(x) {x + 2}
  
   And sometimes I want to call f.1 and some other times f.2 inside another
   function. I can either do:
  
   g - function(x, fpost) {
   calledf - eval(parse(text = paste(f., fpost, sep = )))
   calledf(x)
   ## do more stuff
   }
  
  
   Or:
  
   h - function(x, fpost) {
   calledf - get(paste(f., fpost, sep = ))
   calledf(x)
   ## do more stuff
   }
  
  
   Two questions:
   1) Why is the second better?
  
   2) By changing g or h I could use do.call instead; why would that be
   better?
   Because I can handle differences in argument lists?
  
  
  
   Thanks,
  
  
   R.
 
  --
  Ramón Díaz-Uriarte
  Centro Nacional de Investigaciones Oncológicas (CNIO)
  (Spanish National Cancer Center)
  Melchor Fernández Almagro, 3
  28029 Madrid (Spain)
  Fax: +-34-91-224-6972
  Phone: +-34-91-224-6900
 
  http://ligarto.org/rdiaz
  PGP KeyID: 0xE89B3462
  (http://ligarto.org/rdiaz/0xE89B3462.asc )
 
 
 
  **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem you are trying to solve?


-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] eval(parse(text vs. get when accessing a function

2007-01-06 Thread Ramon Diaz-Uriarte

, enclos) : object f.8 not found
So even in more general cases, except for function redefinitions, etc,
you are not able to call non-existent stuff.

 2nd, If I used the eval-parse approach then I would probably at some point 
 redefine f.1 or f.2 to the output of a regression analysis or something, then 
 go back and run the g function at a later time and wonder why I am getting an 
 error, then once I have finally figured it out, now I need to remember what 
 f.1 did and rewrite it again.  I am much less likely to accidentally replace 
 an element of a list, and if the list is well named I am unlikely to replace 
 the whole list by accident.



Yes, that is true. Again, it does not apply to the actual case I have
in mind, but of course, without the detailed info on context I just
gave, you could not know that.


 3rd, If I ever want to use this code somewhere else (new version of R, on the 
 laptop, give to coworker, ...), it is a lot easier to save and load a single 
 list than to try to think of all the functions that need to be saved.


Oh, sure. But all the functions above live in a single file (actually,
a minipackage) except for the optional use function (which is read
from a file).



 Personally I have never regretted trying not to underestimate my own future 
 stupidity.


Neither do I. And actually, that is why I asked: if Thomas Lumley
said, in the fortune, that I better rethink about it, then I should
try rethinking about it. But I asked because I failed to see what the
problem is.


 Hope this helps,


It certainly does.


Best,

R.


 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 [EMAIL PROTECTED]
 (801) 408-8111



  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Ramon
  Diaz-Uriarte
  Sent: Friday, January 05, 2007 11:41 AM
  To: Peter Dalgaard
  Cc: r-help; [EMAIL PROTECTED]
  Subject: Re: [R] eval(parse(text vs. get when accessing a function
 
  On Friday 05 January 2007 19:21, Peter Dalgaard wrote:
   Ramon Diaz-Uriarte wrote:
Dear All,
   
I've read Thomas Lumley's fortune If the answer is parse() you
should usually rethink the question.. But I am not sure it that
also applies (and why) to other situations (Lumley's comment
http://tolstoy.newcastle.edu.au/R/help/05/02/12204.html
was in reply to accessing a list).
   
Suppose I have similarly called functions, except for a
  postfix. E.g.
   
f.1 - function(x) {x + 1}
f.2 - function(x) {x + 2}
   
And sometimes I want to call f.1 and some other times f.2 inside
another function. I can either do:
   
g - function(x, fpost) {
calledf - eval(parse(text = paste(f., fpost, sep = )))
calledf(x)
## do more stuff
}
   
   
Or:
   
h - function(x, fpost) {
calledf - get(paste(f., fpost, sep = ))
calledf(x)
## do more stuff
}
   
   
Two questions:
1) Why is the second better?
   
2) By changing g or h I could use do.call instead; why
  would that
be better? Because I can handle differences in argument lists?
 
  Dear Peter,
 
  Thanks for your answer.
 
  
   Who says that they are better?  If the question is how to call a
   function specified by half of its name, the answer could well be to
   use parse(), the point is that you should rethink whether that was
   really the right question.
  
   Why not instead, e.g.
  
   f - list(1=function(x) {x + 1} , 2=function(x) {x + 2}) h -
   function(x, fpost) f[[fpost]](x)
  
h(2,2)
  
   [1] 4
  
h(2,1)
  
   [1] 3
  
 
  I see, this is direct way of dealing with the problem.
  However, you first need to build the f list, and you might
  not know about that ahead of time. For instance, if I build a
  function so that the only thing that you need to do to use my
  function g is to call your function f.something, and then
  pass the something.
 
  I am still under the impression that, given your answer,
  using eval(parse(text is not your preferred way.  What are
  the possible problems (if there are any, that is). I guess I
  am puzzled by rethink whether that was really the right question.
 
 
  Thanks,
 
  R.
 
 
 
 
 
 
 
Thanks,
   
   
R.
 
  --
  Ramón Díaz-Uriarte
  Centro Nacional de Investigaciones Oncológicas (CNIO)
  (Spanish National Cancer Center) Melchor Fernández Almagro, 3
  28029 Madrid (Spain)
  Fax: +-34-91-224-6972
  Phone: +-34-91-224-6900
 
  http://ligarto.org/rdiaz
  PGP KeyID: 0xE89B3462
  (http://ligarto.org/rdiaz/0xE89B3462.asc)
 
 
 
  **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en
  s...{{dropped}}
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 




-- 
Ramon Diaz-Uriarte
Statistical Computing Team

Re: [R] eval(parse(text vs. get when accessing a function

2007-01-06 Thread Ramon Diaz-Uriarte

On 1/6/07, Thomas Lumley [EMAIL PROTECTED] wrote:
 On Fri, 5 Jan 2007, Ramon Diaz-Uriarte wrote:

  I see, this is direct way of dealing with the problem. However, you first 
  need
  to build the f list, and you might not know about that ahead of time. For
  instance, if I build a function so that the only thing that you need to do 
  to
  use my function g is to call your function f.something, and then pass
  the something.
 
  I am still under the impression that, given your answer,
  using eval(parse(text is not your preferred way.  What are the possible
  problems (if there are any, that is). I guess I am puzzled by rethink
  whether that was really the right question.
 

 There are definitely situations where parse() is necessary or convenient,
 or we wouldn't provide it. For example, there are some formula-manipulation 
 problems where it really does seem to be the best solution.

 The point of my observation was that it is relatively common for people to 
 ask about parse() solutions to problems, but relatively rare to see them in 
 code by experienced R programmers.  The 'rethink the question' point is that 
 a narrowly-posed programming problem may suggest parse() as the answer, when 
 thinking more broadly about what you are trying to do may allow a completely 
 different approach [the example of lists is a common one].


Yes, the general thing I am trying to do do ---see my response to Greg
Snow for details--- has been done before. And I looked at code from
more experienced programmers, such as David Meyer's tune in e1071. I
think one of the reasons David is using do.call is that he allows to
use arbitrary functions, whereas I do not (currently) need that
functionality.

Thus, instead of calling do.call(whatever) I can call
internalGeneSelect. And, when reading my code, or debugging, it is
easier for me to quickly decode internalGeneSelect (oh, yes,
calling the geneSelection function) than decode do.call.

But my internalGeneSelect depends on eval(parse(text =  and that
is where my doubts started.

Because of this thread, though, I am actually starting to think I
should go ahead and use do.call, because it will make life simpler
if someone (including myself) decides to extend the code.  I guess
this can be a case of thinking more broadly.


 The problem with eval(parse()) is not primarily one of speed.  A problem with 
 parse() is than manipulating text strings is easy to mess up, since text has 
 so much less structure than code. A problem with eval() is that it is too 
 powerful -- since it can do anything, it is harder to keep track of what it 
 is doing.


Yes, I understand that. In my specific case, though, there is quite a
high degree of structure on the text used. And I felt that do.call was
also very powerful (and I've messed with ... in similar situations
in the past).


 In one sense this is just a style issue, but I still think my comment is good 
 advice. If you find yourself wanting to use parse() it is a good idea to stop 
 and think about whether there is a better way to do it. Often, there is. 
 Sometimes, there isn't.


Thanks for your comments. I think here do.call might actually be the
way to go.


Best,

R.






 -thomas

 Thomas Lumley   Assoc. Professor, Biostatistics
 [EMAIL PROTECTED]University of Washington, Seattle





-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] eval(parse(text vs. get when accessing a function

2007-01-06 Thread Ramon Diaz-Uriarte

On 1/6/07, Brian Ripley [EMAIL PROTECTED] wrote:
 On Sat, 6 Jan 2007, Ramon Diaz-Uriarte wrote:
(...)

 
  cvFunct - function(whatever, genefiltertype, whateverelse) {
   internalGeneSelect - eval(parse(text = paste(geneSelect,
  genefiltertype, sep = .)))
 
   ## do things calling internalGeneSelect,
  }

 That looks like a more complicated alternative to

 get(paste(geneSelect, genefiltertype, sep = .))


Yes, you are right, thanks. Actually, now that I think of it, the
eval(parse(text looks _a lot_ more verbose.


 I would worry about scope in both cases: I think you most likely want
 eval.parent in yours, and to pick an environment for use in get() (but
 the view you have shown is still too narrow for us to know).


The function where get (or eval) are called from is defined in a
package. The other  functions (the ones with the postfix) are either
in the same package or in the global environment (read from a file). I
think with both solutions (get and eval) and defining the other
functions both ways (in a package and in the global env) I should be
OK, but I probably want to make this explicit.


Thanks,

R.


  and now define all possible functions as
 
  geneSelect.Fratio - function(x, y, z) {##something}
  geneSelect.Wilcoxon - function(x, y, z) {## something else}
 
  If I want more geneSelect functions, adding them is simple. And I can
  even allow the user to pass her/his own functions, with the only
  restriction that it takes three args, x, y, z, and that the function
  is to be called: geneSelect. and a user choosen string. (Yes, I need
  to make sure no calls to system, etc, are in the user code, etc,
  etc, but that is another issue).

 [...]

 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ANCOVA

2007-01-06 Thread Ramon Diaz-Uriarte

On 1/6/07, Michael Kubovy [EMAIL PROTECTED] wrote:
 On Jan 6, 2007, at 8:34 AM, John Cardinale wrote:

  Are there any R function which can do analysis of covariance?

 ?lm
 RSiteSearch('ancova')



Given the question, you'll probably need to find how to do an ancova
with lm. Several documents in
http://cran.r-project.org/other-docs.html will show you how (and why
ancova is just one special case of linear model). In particular, I
think Faraway's Practical regression and Anova using R has explicit
chapters/sections for Ancova. Many of other standard texts on R/S do
too.

R.






 _
 Professor Michael Kubovy
 University of Virginia
 Department of Psychology
 USPS: P.O.Box 400400Charlottesville, VA 22904-4400
 Parcels:Room 102Gilmer Hall
  McCormick RoadCharlottesville, VA 22903
 Office:B011+1-434-982-4729
 Lab:B019+1-434-982-4751
 Fax:+1-434-982-4766
 WWW:http://www.people.virginia.edu/~mk9y/

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] eval(parse(text vs. get when accessing a function

2007-01-05 Thread Ramon Diaz-Uriarte

Dear All,

I've read Thomas Lumley's fortune If the answer is parse() you should usually 
rethink the question.. But I am not sure it that also applies (and why) to 
other situations (Lumley's comment 
http://tolstoy.newcastle.edu.au/R/help/05/02/12204.html
was in reply to accessing a list).

Suppose I have similarly called functions, except for a postfix. E.g.

f.1 - function(x) {x + 1}
f.2 - function(x) {x + 2}

And sometimes I want to call f.1 and some other times f.2 inside another 
function. I can either do:

g - function(x, fpost) {
calledf - eval(parse(text = paste(f., fpost, sep = )))
calledf(x)
## do more stuff
}


Or:

h - function(x, fpost) {
calledf - get(paste(f., fpost, sep = ))
calledf(x)
## do more stuff
}


Two questions:
1) Why is the second better? 

2) By changing g or h I could use do.call instead; why would that be better? 
Because I can handle differences in argument lists?



Thanks,


R.



-- 
Ramón Díaz-Uriarte
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] eval(parse(text vs. get when accessing a function

2007-01-05 Thread Ramon Diaz-Uriarte

On Friday 05 January 2007 19:21, Peter Dalgaard wrote:
 Ramon Diaz-Uriarte wrote:
  Dear All,
 
  I've read Thomas Lumley's fortune If the answer is parse() you should
  usually rethink the question.. But I am not sure it that also applies
  (and why) to other situations (Lumley's comment
  http://tolstoy.newcastle.edu.au/R/help/05/02/12204.html
  was in reply to accessing a list).
 
  Suppose I have similarly called functions, except for a postfix. E.g.
 
  f.1 - function(x) {x + 1}
  f.2 - function(x) {x + 2}
 
  And sometimes I want to call f.1 and some other times f.2 inside another
  function. I can either do:
 
  g - function(x, fpost) {
  calledf - eval(parse(text = paste(f., fpost, sep = )))
  calledf(x)
  ## do more stuff
  }
 
 
  Or:
 
  h - function(x, fpost) {
  calledf - get(paste(f., fpost, sep = ))
  calledf(x)
  ## do more stuff
  }
 
 
  Two questions:
  1) Why is the second better?
 
  2) By changing g or h I could use do.call instead; why would that be
  better? Because I can handle differences in argument lists?

Dear Peter,

Thanks for your answer.


 Who says that they are better?  If the question is how to call a
 function specified by half of its name, the answer could well be to use
 parse(), the point is that you should rethink whether that was really
 the right question.

 Why not instead, e.g.

 f - list(1=function(x) {x + 1} , 2=function(x) {x + 2})
 h - function(x, fpost) f[[fpost]](x)

  h(2,2)

 [1] 4

  h(2,1)

 [1] 3


I see, this is direct way of dealing with the problem. However, you first need 
to build the f list, and you might not know about that ahead of time. For 
instance, if I build a function so that the only thing that you need to do to 
use my function g is to call your function f.something, and then pass 
the something. 

I am still under the impression that, given your answer, 
using eval(parse(text is not your preferred way.  What are the possible 
problems (if there are any, that is). I guess I am puzzled by rethink 
whether that was really the right question. 


Thanks,

R.







  Thanks,
 
 
  R.

-- 
Ramón Díaz-Uriarte
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] eval(parse(text vs. get when accessing a function

2007-01-05 Thread Ramon Diaz-Uriarte

On Friday 05 January 2007 19:35, Bert Gunter wrote:
 ??

 Or to add to what Peter Dalgaard said... (perhaps for the case of many more
 functions)

 Why eval(parse())? What's wrong with if then?

 g - function(fpost,x){if(fpost==1)f.1 else f.2 }(x)

 or switch() if you have more than 2 possible arguments? I think your
 remarks reinforce the wisdom of Thomas's axiom .

Thanks, Bert, but as with Peter's solution, your solution forces me to build g 
ahead of time. And again, I am not sure I see why the attempt to avoid 
eval(parse(text.

Best,

R.



 Bert Gunter
 Genentech Nonclinical Statistics
 South San Francisco, CA 94404


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Ramon Diaz-Uriarte
 Sent: Friday, January 05, 2007 10:02 AM
 To: r-help; [EMAIL PROTECTED]
 Subject: [R] eval(parse(text vs. get when accessing a function

 Dear All,

 I've read Thomas Lumley's fortune If the answer is parse() you should
 usually
 rethink the question.. But I am not sure it that also applies (and why) to
 other situations (Lumley's comment
 http://tolstoy.newcastle.edu.au/R/help/05/02/12204.html
 was in reply to accessing a list).

 Suppose I have similarly called functions, except for a postfix. E.g.

 f.1 - function(x) {x + 1}
 f.2 - function(x) {x + 2}

 And sometimes I want to call f.1 and some other times f.2 inside another
 function. I can either do:

 g - function(x, fpost) {
 calledf - eval(parse(text = paste(f., fpost, sep = )))
 calledf(x)
 ## do more stuff
 }


 Or:

 h - function(x, fpost) {
 calledf - get(paste(f., fpost, sep = ))
 calledf(x)
 ## do more stuff
 }


 Two questions:
 1) Why is the second better?

 2) By changing g or h I could use do.call instead; why would that be
 better?
 Because I can handle differences in argument lists?



 Thanks,


 R.

-- 
Ramón Díaz-Uriarte
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple parallel computing on single multicore machine

2006-12-01 Thread Ramon Diaz-Uriarte

On Friday 01 December 2006 13:23, Millo Giovanni wrote:
 Dear List,

 the advent of multicore machines in the consumer segment makes me wonder
 whether it would, at least in principle, be possible to divide a
 computational task into more slave R processes running on the different
 cores of the same processor, more or less in the way package SNOW would
 do on a cluster. I am thinking of simple 'embarassingly parallel'
 problems, just like inverting 1000 matrices, estimating 1000 models or
 the like.

 I have seen some talk here on making R multi-threaded and the like, but
 this is much simpler. I am just a curious useR, so don't bother if you
 don't have time, but maybe you can point me at some resource, or just
 say this is nonsense...



Dear Millo,

I find the usage of papply (from the library with the same name), which itself 
uses Rmpi to be easy and ideal for those cases. The papply documentation 
shows clearly what you need to do to pass the required arguments to papply. 
And once you have your MPI universe up and running (with whichever number of 
slaves you specify) it just works. As well, I find debugging very simple: 
just start an MPI universe with only one node, which forces papply to run 
serially (non-parallel) so wrong arguments, missing libraries, etc, are easy 
to spot.

Best,

R.






 Cheers
 Giovanni

 Giovanni Millo
 Research Dept.,
 Assicurazioni Generali SpA
 Via Machiavelli 4,
 34131 Trieste (Italy)
 tel. +39 040 671184
 fax  +39 040 671160

 Ai sensi del D.Lgs. 196/2003 si precisa che le informazioni ...{{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
Ramón Díaz-Uriarte
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] princomp and factanal()

2006-11-28 Thread Ramon Diaz-Uriarte

On Tuesday 28 November 2006 16:03, Tom Backer Johnsen wrote:
 I have been looking at the documentation and the output from the
 functions princomp() and factanal(), and found them somewhat difficult
 to understand.  This is probably due to differences in respect to what
 I am used to with respect to terminology (my field is psychology).
 Are there some additional texts which might help?

Dear Tom,

I suggest you look at MASS (Modern Applied Statistics With S), by Venables  
Ripley. These issues are explained there (someone borrowed my copy, so I 
can't tell you the chapter, pages, etc). You might also want to take a look 
in a multivariate stats. book (e.g., Krzanowski explains these issues very 
well) for general differences between PCA and factor analysis.



HTH,

R.



 Sincerely,

 Tom

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] command option for R CMD BATCH

2006-11-24 Thread Ramon Diaz-Uriarte

Thanks.

R.

On Thursday 23 November 2006 16:32, Prof Brian Ripley wrote:
 On Thu, 23 Nov 2006, Ramon Diaz-Uriarte wrote:
  On Thursday 23 November 2006 15:44, Prof Brian Ripley wrote:
  Try this:
 
  gannet% cat month.R
  x - commandArgs()
  print(x[length(x)])
 
  gannet% R --slave --args January  month.R
  [1] January
 
  Is the above
  R --slave --args January  month.R
  the preferred way of using it?

 Yes it is.  That's exactly what --args was added to allow.

  I tend to use
 
  R --slave  month.R January
 
  instead (as a consequence of reconverting former scripts that used R CMD
  BATCH). The second call produces a ARGUMENT 'January' __ignored__ but
  otherwise seems to do the same thing.

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] command option for R CMD BATCH

2006-11-23 Thread Ramon Diaz-Uriarte

On Thursday 23 November 2006 15:44, Prof Brian Ripley wrote:
 Try this:

 gannet% cat month.R
 x - commandArgs()
 print(x[length(x)])

 gannet% R --slave --args January  month.R
 [1] January



Is the above 
R --slave --args January  month.R
the preferred way of using it? 

I tend to use 

R --slave  month.R January

instead (as a consequence of reconverting former scripts that used R CMD 
BATCH). The second call produces a ARGUMENT 'January' __ignored__ but 
otherwise seems to do the same thing. 

Thanks,

R.


 On Thu, 23 Nov 2006, Patrick Connolly wrote:
  I wish to use R CMD BATCH to run a small R function which reads a text
  file and plots a single graph to a PDF file.
 
  version
 
_
  platform   x86_64-unknown-linux-gnu
  arch   x86_64
  os linux-gnu
  system x86_64, linux-gnu
  status
  major  2
  minor  4.0
  year   2006
  month  10
  day03
  svn rev39566
  language   R
  version.string R version 2.4.0 (2006-10-03)
 
 
  The text files are monthly data, (called lyrical names like
  October.txt or November.txt) and the end result of each run will be a
  PDF file called October.pdf, etc.
 
  It's simple enough to make a separate file for each month which has
  the command to call the R function, e.g.  October.r would be
  plot.month(October.txt)
  and use it like so:
 
  R CMD BATCH October.r /dev/null
  (the R function creates the name for the PDF file)
 
  or slightly more elegantly, a one line shell script that takes an
  argument: R CMD BATCH $1.r /dev/null
  (so that the script name and the name of the month will make a PDF
  file for that month)
 
  What I'd like to do is avoid the need to make the Month.r files and
  have the script pass the month information directly to the function
  that a single .r file would call.  If I brushed up on a bit of Perl, I
  might work out how to modify the shell script to do such a thing, but
  I suspect it should be simpler than that.
 
  I had thought of using litter for such a thing, but as I looked into
  it, I get the impression that's not the idea of litter.  (I'm also a
  bit reluctant to recompile R.)
 
  Ideas welcome.
 
 
  Thanks

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] snow's makeCluster hanging (using Rmpi)

2006-11-08 Thread Ramon Diaz-Uriarte

On Tuesday 07 November 2006 19:28, Randall C Johnson [Contr.] wrote:
 On 11/7/06 11:28 AM, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote:
  On Tuesday 07 November 2006 15:56, Randall C Johnson [Contr.] wrote:
  Hello everyone,
  I've been fiddling around with the snow and Rmpi packages on my new
  Intel Mac, and have run into a few problems. When I make a cluster on my
  machine, both slaves start up just fine, and everything works as
  expected. When I try to make a cluster including another networked
  machine it hangs. I've followed the suggestions at
  http://finzi.psych.upenn.edu/R/Rhelp02a/archive/83086.html and
  http://www.stat.uiowa.edu/~luke/R/cluster/cluster.html but to no avail.
  Everything seems to start up fine using lamboot, but then hangs when
  making the cluster in R. Making a cluster with 2 slaves seems to work
  fine, but if I increase the number (to use the networked machines) it
  hangs again.
 
  I've tried networking to another Mac, and also to a machine running Red
  Hat Linux. Both machines can set up their own local clusters. Does
  anyone have any ideas?
 
  Dear Randy,
 
  A few suggestions:
 
  a) make sure there are no firewalls; I assume this is actually the case,
  but anyway;

 I don't think I have any firewalls running. I checked and they all seem to
 be disabled...


you can use (under GNU/Linux at least) the command (as root)

iptables -L

If there are no iptables-based firewall you should see something like:

Chain INPUT (policy ACCEPT)
target prot opt source   destination 

Chain FORWARD (policy ACCEPT)
target prot opt source   destination 

Chain OUTPUT (policy ACCEPT)
target prot opt source   destination

Make sure this is OK in all the machines.


  b) what happens if you lamboot outside R (and create a universe with a
  local and a networked machine) and then you do: lamexec -np 6 hostname?

 This prints out the host names of each machine as expected.


OK, so its not lam itself (so a) is probably unneeded).


  c) are the Rmpi and snow installed in the same directories in the
  different machines? are there version differences in Rmpi (or Snow)
  between machines?

 I've installed the same versions, but they are in different directories...


I think I remember that having Rmpi and Snow in different directories tended 
to cause problems. Now, I always place them in the same directory. I think 
that some sh Rmpi script looks for other scripts, and if they are not where 
it expect thems, it fails.


 I also tried an example per Luke Tierney's suggestion using only Rmpi, and
 I get the following error when trying to spawn the Rslaves after starting
 up with lamboot (outside of R). I tried to use laminfo, but I'm not sure
 what I'm looking for or how to use the information given...

  library(Rmpi)
  mpi.spawn.Rslaves()

 ---
-

 It seems that [at least] one of the child processes that was started
 by MPI_Comm_spawn* chose a different RPI than the parent MPI
 application.  For example, one (of the) child process(es) that
 differed from the parent is shown below:

 Parent application: MPI_Comm_spawn
 Child MPI_COMM_WORLD rank usysv (v7.1.0): 0

 All MPI processes must choose the same RPI module and version when
 they start.  Check your SSI settings and/or the local environment
 variables on each node.
 ---
- R(26444) malloc: ***  Deallocation of a pointer not malloced: 0x16379a0;
 This could be a double free(), or free() called with the middle of an
 allocated block; Try setting environment variable MallocHelp to see tools
 to help debug
 Error in mpi.comm.spawn(slave = system.file(Rslaves.sh, package =
 Rmpi),

 MPI_Error_string: unclassified


Now that is way over my head. A few things I'd check: Are you mixing 32-bit 
with 64-bit machines? (I've done that in the past, x86 and x86_64, without 
apparent problems, but I've never used Macs for this). Can you try using two 
different machines with the same architecture? What about gcc compilers: are 
you using very different versions on each machine?


Best,

R.


  HTH,
 
  R.
 
  Thanks,
  Randy
 
  sessionInfo()
 
  R version 2.4.0 Patched (2006-10-03 r39576)
  i386-apple-darwin8.8.2
 
  locale:
  C
 
  attached base packages:
  [1] methods   stats graphics  grDevices utils
  datasets [7] base
 
  other attached packages:
 Rmpisnow
  0.5-3 0.2-2

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help

Re: [R] snow's makeCluster hanging (using Rmpi)

2006-11-07 Thread Ramon Diaz-Uriarte

On Tuesday 07 November 2006 15:56, Randall C Johnson [Contr.] wrote:
 Hello everyone,
 I've been fiddling around with the snow and Rmpi packages on my new Intel
 Mac, and have run into a few problems. When I make a cluster on my machine,
 both slaves start up just fine, and everything works as expected. When I
 try to make a cluster including another networked machine it hangs. I've
 followed the suggestions at
 http://finzi.psych.upenn.edu/R/Rhelp02a/archive/83086.html and
 http://www.stat.uiowa.edu/~luke/R/cluster/cluster.html but to no avail.
 Everything seems to start up fine using lamboot, but then hangs when making
 the cluster in R. Making a cluster with 2 slaves seems to work fine, but if
 I increase the number (to use the networked machines) it hangs again.

 I've tried networking to another Mac, and also to a machine running Red Hat
 Linux. Both machines can set up their own local clusters. Does anyone have
 any ideas?

Dear Randy,

A few suggestions:

a) make sure there are no firewalls; I assume this is actually the case, but 
anyway;

b) what happens if you lamboot outside R (and create a universe with a local 
and a networked machine) and then you do: lamexec -np 6 hostname?

c) are the Rmpi and snow installed in the same directories in the different 
machines? are there version differences in Rmpi (or Snow) between machines?


HTH,

R.




 Thanks,
 Randy

  sessionInfo()

 R version 2.4.0 Patched (2006-10-03 r39576)
 i386-apple-darwin8.8.2

 locale:
 C

 attached base packages:
 [1] methods   stats graphics  grDevices utils datasets
 [7] base

 other attached packages:
Rmpisnow
 0.5-3 0.2-2


 ~~
 Randall C Johnson
 Bioinformatics Analyst
 SAIC-Frederick, Inc (Contractor)
 Laboratory of Genomic Diversity
 NCI-Frederick, P.O. Box B
 Bldg 560, Rm 11-85
 Frederick, MD 21702
 Phone: (301) 846-1304
 Fax: (301) 846-1686
 ~~

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Beginners manual for emacs and ess

2006-09-21 Thread Ramon Diaz-Uriarte

On Wednesday 20 September 2006 17:16, Marc Schwartz (via MN) wrote:
 On Wed, 2006-09-20 at 17:03 +0200, Rainer M Krug wrote:
  Hi
 
  I heard so much about Emacs and ESS that I decided to try it out - but I
am stuck at the beginning.
 
  Is there anywhere a beginners manual for Emacs  ESS to be used with R?
  even M-x S tells me it can't start S-Plus - obviously - but I want it to
  start R...
 

While following Mark's suggestions, try doing 

M-x R

and that might start R.

Then, you can do:

C-x 2 (split the screen as is said in other editors)

move to the one without the running R, and open there an R file (or you can 
just create it on the fly:

C-x C-f 

and in the minibuffer type anything, e.g., one-file.R   (without the 
quotes).



Then, type

C-h m

and you'll get a list of stuff related to the ESS mode.



And I think you will then really need to look at the ESS doc and go through 
the (X)Emacs tutorial (which is available from the help, in (X)Emacs).


HTH,


R.




  Any help welcome (otherwise I will be stuck with Eclipse and R)
 
  Rainer

 There are some reference materials on the main ESS site at:

   http://ess.r-project.org/

 In addition, there is a dedicated ESS mailing list, with more info here:

   https://stat.ethz.ch/mailman/listinfo/ess-help

 HTH,

 Marc Schwartz

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Statitics Textbook - any recommendation?

2006-09-21 Thread Ramon Diaz-Uriarte

On Wednesday 20 September 2006 22:21, Iuri Gavronski wrote:
 I would like to buy a basic statistics book (experimental design,
 sampling, ANOVA, regression, etc.) with examples in R. Or download it
 in PDF or html format.
 I went to the CRAN contributed documentation, but there were only R
 textbooks, that is, textbooks where R is the focus, not the
 statistics. And I would like to find the opposite.
 Other text I am trying to find is multivariate data analysis (EFA,
 cluster, mult regression, MANOVA, etc.) with examples with R.
 Any recommendation?


I'd say the situation is actually the opposite.  Anyway, the recent book by 
Brian Everitt and Torsten Hothorn (A handbook of statistical analyses using 
R. Chapman  Hall) is an excellent (and affordable) place to start. (I think 
that this book's context emphasizes that it is stats with R as the language: 
Everitt has (co)authored a bunch of others in other languages ---SAS, Stata, 
SPSS, etc).


Of course, there are many others that probably deserver a place on your (or 
your library's) shelves:

P. Dalgaard's
MASS
Maindonald  Braun
Heiberger  Holland
etc



HTH,


R.

 Thank you in advance,

 Iuri.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Authoring a book

2006-08-25 Thread Ramon Diaz-Uriarte

Dear Tom,

To add a few things to explore:

- I'd definitely go with LaTeX. Depending on how much formatting control you 
want, though, and if your coworkers are reluctant to jump into LaTeX, you 
might start with reStructuredText (http://docutils.sourceforge.net/rst.html) 
or text2tags (http://txt2tags.sourceforge.net/). With both, you can produce 
LaTeX, but innitially at least it allows you to write text with structure 
using markup that is a lot simpler than latex.


- I'd definitely use a version control system. Instead of CVS or SVN, though, 
I'd suggest you take a look at some of the distributed ones, in particular 
Bazaar-NG (http://bazaar-vcs.org), Mercurial 
(http://www.selenic.com/mercurial/wiki/index.cgi) or Darcs 
(http://abridgegame.org/darcs/). 

These three are probably among the most mature ones (though oppinions will 
vary, of course; I have some notes and links at: 
http://www.ligarto.org/rdiaz/VersionControl.html). 

What I like about any of these is that I think they provide you essentially 
all SVN can provide (except for the user-base and years of existence of SVN) 
plus a lot more. For instance, if you often work without access to the remote 
repository, with any of these three systems you can enjoy all the benifits of 
version control. Cherry-picking is easier with any of these than with 
CVS/SVN, and Darcs in particular excels at it.


- For bibliography, I find CiteULike (http://www.citeulike.org/) fabulous. 
Needs internet access, and might not work with the journals/data bases that 
you use, though. It can export as bibtex.


- If you find outliners useful (or absolutely essential) then you might want 
to look at Leo (http://webpages.charter.net/edreamleo/front.html). Leo is 
agnostic regarding whether you write LaTeX, plain text, or R code (though it 
has great support for some languages such as Python or rst), and you can use 
Leo and still edit files in your editor of choice (I use Leo for working with 
fairly large latex files that I edit under Emacs). However, for this to work, 
all of you should agree to use Leo (or at least not disturb the sentinel 
lines that leo uses).


Hope this helps (or at least provides entertaining links :-).

R.


On Thursday 24 August 2006 21:10, Tom Backer Johnsen wrote:
 Stefan Grosse wrote:
  I think Peter Dalgaard is right.
 
  Since you are able to use R I believe you will be very fast in learning
  LaTeX.
 
  I think it needs less then a week to learn the most common LaTeX
  commands. And setting up a wiki and trying then to convert this into a
  printable document format plus learning the wiki syntax is probably more
  time consuming. Beside this R is able to work perfectly together with
  LaTeX, it creates LaTeX output and is doing excellent graphics in the
  EPS/PS format.
 
  The best introduction for LaTeX is the not so short introduction:
  http://people.ee.ethz.ch/~oetiker/lshort/lshort.pdf

 It really was a not too short intro.  I'll have a look at it.

  If you still are not convinced have a look at UniWakkaWiki:
  http://uniwakka.sourceforge.net/HomePage
 
  It is a Wiki for Science and University purposes and claims to be able
  to export to Openoffice as well as to LaTeX.

 Looks interesting and I really like the concept, but how stable is it?
   It looks rather fresh from the web page, but I may be wrong.  A
 bibliography function is really a big advantage, so ... perhaps.

 Tom

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rpvm/snow packages on a cluster with dual-processor machi nes

2006-08-17 Thread Ramon Diaz-Uriarte

Dear Paul,

(I forgot to answer over the weekend). With mpi it is essentially the same. 
When using makeCluster, specify the number of slaves. If you have three 
machines, and you want each to run two slave processes, just use a 6.

Before that, though, you should tell LAM/MPI how to set up the lam universe. 
The simplest way is to specify that in a configuration file for LAM. Put 
something like this (using appropriate IPs or host names; cpu=xx indicates 
that you want each physical node to run those many xx slaves; it might, or 
might not, be related to the actual number of CPUs) in a file called, say, 
lamb-conf1.def

192.168.2.2 cpu=2
192.168.2.3 cpu=2
192.168.2.4 cpu=2



Now do (as user, NOT root)

lamboot -v lamb-conf1.def

If that works, then start R, and use snow. 

A very good explanation on how to use mpi with R appeared in R news a while 
ago by the author of Rmpi.


HTH,

R.



On Monday 14 August 2006 16:17, Liaw, Andy wrote:
 That's what I've tried before, on three dual-Xeon boxes, so I know it
 worked (as documented a that time).

 Andy

 From: Paul Y. Peng

  Luke Tierney just reminded me that makeCluster() can take a
  number greater than the number of machines in a cluster. It
  seems to be a solution to this problem. But I haven't tested it yet.
 
  Paul.
 
  Ryan Austin wrote:
   Hi,
  
   Adding a node twice gives a duplicate node error.
   However, adding the parameter sp=2000 to your pvm hostfile should
   enable dual processors.
  
   Ryan
  
   Liaw, Andy wrote:
   Caveat: I've only played with this a couple of years ago...
  
   I believe you can just add each host _twice_ (or as many
 
  times as the
 
   number of CPUs at that host) to get both CPUs to work.
  
   Andy
  
   From: Paul Y. Peng
  
   Hi,
  
   does anybody know how to use the dual processors in the
 
  machines of
 
   a cluster? I am using R with rpvm and snow packages. I
 
  usually start
 
   pvm daemon and add host machines first, and then run R to
 
  start my
 
   computing work. But I find that only one processor in
 
  each machine
 
   is used in this way and the other one always stays idle. Is there
   any simple way to tell pvm to use the two processors at the same
   time? In other words, I would like to see two copies of R
 
  running on
 
   each machine's two processors when using pvm. Any hints/help are
   greatly appreciated.
  
   Paul.
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rpvm/snow packages on a cluster with dual-processor machines

2006-08-11 Thread Ramon Diaz-Uriarte

Dear Paul,

I have no direct experience with rpvm, but doing it with rmpi is a piece of 
cake. I could provide you with some hints if you want. (I am tempted to ask 
why you are using PVM instead of MPI, but this might be the wrong question). 

Best,

R.

On Friday 11 August 2006 18:12, Paul Y. Peng wrote:
 Hi,

 does anybody know how to use the dual processors in the machines of a
 cluster? I am using R with rpvm and snow packages. I usually start pvm
 daemon and add host machines first, and then run R to start my computing
 work. But I find that only one processor in each machine is used in this
 way and the other one always stays idle. Is there any simple way to tell
 pvm to use the two processors at the same time? In other words, I would
 like to see two copies of R running on each machine's two processors when
 using pvm. Any hints/help are greatly appreciated.

 Paul.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rpvm/snow packages on a cluster with dual-processor machines

2006-08-11 Thread Ramon Diaz-Uriarte

Dear Paul,

I am leaving right now. I'll send you the info over the weekend. But note that 
I do think it is quite possible to use pvm for your setup. I just have no 
experience with it.

R.

On Friday 11 August 2006 19:21, Paul Y. Peng wrote:
 Hi Ramon,

 please let me know how you achieve this with rmpi. I use PVM simply because
 I picked it up first and it worked well for me. If MPI is the only way to
 make use the two processors, I will find out whether it is available or
 works in our cluster. Thanks a lot for your response.

 Regards,
 Paul.

 Ramon Diaz-Uriarte wrote:
  Dear Paul,
 
  I have no direct experience with rpvm, but doing it with rmpi is a piece
  of cake. I could provide you with some hints if you want. (I am tempted
  to ask why you are using PVM instead of MPI, but this might be the wrong
  question).
 
  Best,
 
  R.
 
  On Friday 11 August 2006 18:12, Paul Y. Peng wrote:
  Hi,
 
  does anybody know how to use the dual processors in the machines of a
  cluster? I am using R with rpvm and snow packages. I usually start pvm
  daemon and add host machines first, and then run R to start my computing
  work. But I find that only one processor in each machine is used in this
  way and the other one always stays idle. Is there any simple way to tell
  pvm to use the two processors at the same time? In other words, I would
  like to see two copies of R running on each machine's two processors
  when using pvm. Any hints/help are greatly appreciated.
 
  Paul.
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
  minimal, self-contained, reproducible code.

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory problems when combining randomForests

2006-08-01 Thread Ramon Diaz-Uriarte

Dear Eleni,


 But if every time you remove a variable you pass some test data (ie data
 not used to train the model) and base the performance of the new, reduced
 model on the error rate on the confusion matrix for the test data, then
 this overfitting should not be an issue, right?  (unless of course you
 were referring to unsupervised learning).



Yes and no. The problem there could arise if you do this iteratively and use 
the minimum value you obtain with your procedure to return an estimate of the 
error rate. In such a case, you should, instead, do a double cross-validation 
or bootstrap (i.e., estimate, via cross-validation ---or the bootstrap--- the 
error rate of your complete procedure).

Both Andy and collaborators on the one hand and myself on the other have done 
some further work on these issues.

Svetnik V, Liaw A, Tong C, Wang T: Application of Breiman's random forest to 
modeling structure-activity relationships of pharmaceutical molecules.
Multiple Classier Systems, Fifth International Workshop, MCS 2004, 
Proceedings, 9–11 June 2004, Cagliari, Italy. Lecture Notes in Computer 
Science, Springer 2004, 3077:334-343.

Gene selection and classification of microarray data using random forest
Ramón Díaz-Uriarte and Sara Alvarez de Andrés. BMC Bioinformatics 2006, 7:3. 
http://www.biomedcentral.com/1471-2105/7/3


Best,

R.



On Monday 31 July 2006 18:45, Eleni Rapsomaniki wrote:
 Hi Andy,

   I get different order of importance for my variables depending on their

 order in the training data.

 Perhaps answering my own question, the change in importance rankings could
 be attributed to the fact that before passing my data to randomForest I
 impute the missing values randomly (using the combined distributions of
 pos+neg), so the data seen by RF is slightly different. Then combining this
 with the fact that RF chooses data randomly it makes sense to see different
 rankings.

 In a previous thread regarding simplifying variables:
 http://thread.gmane.org/gmane.comp.lang.r.general/6989/focus=6993

 you say:
 The basic problem is that when you select important variables by RF and
 then re-run RF with those variables, the OOB error rate become biased
 downward. As you iterate more times, the overfitting becomes more and
 more severe (in the sense that, the OOB error rate will keep decreasing
 while error rate on an independent test set will be flat or increases)

 But if every time you remove a variable you pass some test data (ie data
 not used to train the model) and base the performance of the new, reduced
 model on the error rate on the confusion matrix for the test data, then
 this overfitting should not be an issue, right?  (unless of course you
 were referring to unsupervised learning).

 Best regards
 Eleni Rapsomaniki
 Birkbeck College, UK

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electr�nico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Colinearity Function in R

2006-07-06 Thread Ramon Diaz-Uriarte

Dear Peter,

I especially like the VIF (and GVIF) functions in package car, by John Fox. 
(I'm assuming you are dealing with [generalized] linear models).

HTH,

R.


On Wednesday 05 July 2006 17:16, Peter Lauren wrote:
 Is there a colinearty function implemented in R?  I
 have tried help.search(colinearity) and
 help.search(collinearity) and have searched for
 colinearity and collinearity on
 http://www.rpad.org/Rpad/Rpad-refcard.pdf but with no
 success.

 Many thanks in advance,
 Peter Lauren.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Editors which have strong/solid support for SWeave?

2006-07-05 Thread Ramon Diaz-Uriarte

On Wednesday 05 July 2006 10:14, A.J. Rossini wrote:
 Greetings!

 I have a few colleagues who like the idea of Sweave, but have failed
 to become enlightened monks of the One True Editor
 (http://www.dina.dk/~abraham/religion/)

 Are there any other Microsoft-centric editors or IDEs which have solid
 support for writing SWeave documents (dual R / LaTeX enhancements
 similar to ESS's support)?  Has anyone tried the folding editors which
 support Noweb?


Dear Tony,


I often use Leo (http://webpages.charter.net/edreamleo/front.html) which is 
like a literate editor on steroids (folding + outlining, noweb and cweb 
support, and a _lot_ more), and I use it for all complex/long Rnw documents, 
including interacting with R ...

...but I cheat, because the editing itself (of the nodes or folds), 
including submitting code to R from the R chunks, I do in emacs (with ESS).

Leo is available for Linux, Win, Mac and is written in Python.

R.



 (the alternative would be brainwashing, but that is generally frowned upon
 ;-).

 best,
 -tony

 [EMAIL PROTECTED]
 Muttenz, Switzerland.
 Commit early,commit often, and commit in a repository from which we can
 easily roll-back your mistakes (AJR, 4Jan05).

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Editors which have strong/solid support for SWeave?

2006-07-05 Thread Ramon Diaz-Uriarte

On Wednesday 05 July 2006 16:05, A.J. Rossini wrote:
 On 7/5/06, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote:
  On Wednesday 05 July 2006 10:14, A.J. Rossini wrote:
   Greetings!
  
   I have a few colleagues who like the idea of Sweave, but have failed
   to become enlightened monks of the One True Editor
   (http://www.dina.dk/~abraham/religion/)
  
   Are there any other Microsoft-centric editors or IDEs which have solid
   support for writing SWeave documents (dual R / LaTeX enhancements
   similar to ESS's support)?  Has anyone tried the folding editors which
   support Noweb?
 
  Dear Tony,
 
 
  I often use Leo (http://webpages.charter.net/edreamleo/front.html) which
  is like a literate editor on steroids (folding + outlining, noweb and
  cweb support, and a _lot_ more), and I use it for all complex/long Rnw
  documents, including interacting with R ...
 
  ...but I cheat, because the editing itself (of the nodes or folds),
  including submitting code to R from the R chunks, I do in emacs (with
  ESS).
 
  Leo is available for Linux, Win, Mac and is written in Python.

 I've used Leo a few years ago, and liked it (but not enough to
 convert).  I'll have to try it again.  Thanks!


From my Leo's usage patterns I think I'm still praying at the emacs church. I 
guess my soul is saved (for now).


But I find Leo great, and I always wish I could use it more. Making it 
understand R syntax for syntax highlighting seems to be relatively easy, more 
so with the recent changes in Leo's code 
(http://webpages.charter.net/edreamleo/coloring.html), and at least one other 
R user who also frequents R-help, Ed Borasky, is interested in these issues 
(http://sourceforge.net/forum/forum.php?thread_id=1524935forum_id=10226).


I think what would be a real blast is to have Leo understand R (and LaTeX), 
more or less the way leo understands Python. For instance, when one imports a 
Python file it gets broken down (outlined) by function, method, etc. This 
seems doable (e.g., http://sourceforge.net/forum/message.php?msg_id=3614539), 
but I haven't yet had time to look at it. And then, Leo also offers a general 
way (which I think is still only fully exploited with Python files) for 
autocompletion, etc, (though this seems to be a harder problem).


Just my random ramblings.

Best,

R.









 best,
 -tony

 [EMAIL PROTECTED]
 Muttenz, Switzerland.
 Commit early,commit often, and commit in a repository from which we can
 easily roll-back your mistakes (AJR, 4Jan05).

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electr�nico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] FW: How to create a new package?

2006-06-01 Thread Ramon Diaz-Uriarte

Dear Rita,

Do you want a package just for yourself, or something useful for others, with 
docs, etc? I think the rest of the answers in this thread will help you 
create a full fledged package. See also the detailed explanation 
in Writing R extensions.

If you just want something quick and dirty that allows you to use a bunch of 
functions without using source (and thus cluttering your global workspace), 
is easy to move around, etc, you just need a directory structure such as:

SignS2/
SignS2/R/
SignS2/R/SignS2.R
SignS2/DESCRIPTION
SignS2/Changes

(Change SignS2 for the name of your package).

This has no documentation whatsoever. You can get rid of the changes file, 
but I put it there to keep track of changes.

Run R CMD check against the directory (of coruse, you'll get warnings about 
missing documentation), and then R CMD build.

Best,

R.


On Thursday 01 June 2006 13:23, michael watson (IAH-C) wrote:
 ?package.skeleton

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck
 Sent: 01 June 2006 12:20
 To: Rita Sousa
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] FW: How to create a new package?

 The minimum is to create a DESCRIPTION file, plus R and man directories
 containing R code and .Rd files respectively. It might help to run  Rcmd
 CHECK mypkg  before installation and fix any problems it finds.

 Googling for   creating R package   will locate some tutorials.

 On 6/1/06, Rita Sousa [EMAIL PROTECTED] wrote:
  Hi,
 
 
 
  I'm a group of functions and I would like to create a package for load in
  R. I have created a directory named INE and a directory below that named
  R, for the files of R functions. A have created the files DESCRIPTION and
  INDEX in the INE directory. The installation from local zip files, in the
  R 2.3.0, results but to load the package I get an error like:
 
 
 
  'INE' is not a valid package -- installed  2.0.0?
 
 
 
  I think that is necessary create a Meta directory with package.rds
  file, but I don't know make it! I have read the manual 'Writing R
  Extensions - 1. Creating R packages' but I don't understand the
  procedure...
 
  Can I create it automatically?
 
 
 
  Could you help me with this?
 
 
 
  Thanks,
 
  ---
  Rita Sousa
  DME - ME: Departamento de Metodologia Estatística - Métodos
  Estatísticos INE - DRP: Instituto Nacional de Estatística - Delegação
  Regional do Porto
  Tel.: 22 6072016 (Extensão: 4116)
  ---
 
 
 
 
 [[alternative HTML version deleted]]
 
 
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Tranferring R results to word prosessors

2006-02-15 Thread Ramon Diaz-Uriarte

I started using LyX; it is very straightforward. Then, I started exporting to 
LaTeX and playing around with the LaTeX file (I found it faster than using 
LyX, and could take my file anywhere they had something that could manipulate 
text ---emacs, vim, nedit, whatever).

Googling you'll find _many_ LaTeX tutorials. Which one is best probably 
depends a lot on your preferences and learning style. 

As for books, I find Guide to LaTeX by Kopka  Daly (I think now in its 
fourth edition) far easier to use (to learn and for reference) than the 
series of LaTeX books by Goossens et al. and Lamport. (And I only need to 
haul around a single book, not 2 to 5). But then, again, this is surely a 
matter of personal taste.

HTH,

R.


On Thursday 09 February 2006 20:08, Patrick Burns wrote:
 One approach is to use LyX (http://www.lyx.org/).
 This is a lot like using Word or other word processors
 but it creates LaTeX.  You probably won't need to
 know anything about TeX for a long time unless you
 are doing really weird things.

 Patrick Burns
 [EMAIL PROTECTED]
 +44 (0)20 8525 0696
 http://www.burns-stat.com
 (home of S Poetry and A Guide for the Unwilling S User)

 roger bos wrote:
 Yeah, but I don't understand LaTeX at all.  Can you point me to a good
 beginners guide?
 
 Thanks,
 
 Roger
 
 On 2/9/06, Barry Rowlingson [EMAIL PROTECTED] wrote:
 Tom Backer Johnsen wrote:
 I have just started looking at R, and are getting more and more
 
 irritated
 
 at myself for not having done that before.
 
 However, one of the things I have not found in the documentation is some
 way of preparing output from R for convenient formatting into something
 like MS Word.
 
 Well whatever you do, don't start looking at LaTeX, because that will
 get you even more irritated at yourself for not having done it before.
 
 LaTeX is to Word as R is to what? SPSS?
 
 I've still not seen a pretty piece of mathematics - or even text - in
 Word.
 
 Barry
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 
  [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] R, AMD Opteron 64, and Rmpi

2006-02-14 Thread Ramon Diaz-Uriarte

Dear All,

I found Andy Liaw's suggestion about using a NUMA (instead of SMP) kernel when 
running R on amd64 with  1 CPU

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/35109.html


A couple of questions:

1. Is this still the case with the newer dual-core opterons (e.g., the 275 et 
al., families) running Linux (kernel 2.6)?

2. How does this affect using Rmpi (and snow, papply, et al.) on multi-server 
clusters with  1 CPU? If I understand correctly, and if the situation is 
what Andy described, if we use a SMP kernel we will suffer a within-node 
penalty in one of the Rmpi processes. Is this correct?

Thanks,

R.


-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] studentship till January 2007

2005-08-24 Thread Ramon Diaz-Uriarte

Please pass along, and apologies for double posting.

We have money to support a student till January 2007 (with a salary of about 
1000 euro/month). Most of the work will focus on classification/prediction 
using microarray data. The work will involve both methodological research 
(mainly computation and simulation-based) and implementation of existing 
approaches using R (and possible development of web-based applications). In 
addition to the main focus of the job, the student will be encouraged (and 
expected) to get involved in the many collaborations we have with wet lab 
cancer researchers.

The candidate should have a bachelors or MSc degree in stats or related 
fields. A genuine interest in applied statistics and statistical 
consulting,and experience with multivariate methods, linear models, logistic 
regression, and survival analysis are required. Proficiency with R and 
knowledge of C/C++ or Fortran are required. Familiarity with Python (and Perl 
and/or Tcl/Tk) and some experience with development of web-based applications 
(CGIs using Python, for example) highly valued. Our machines only run 
GNU/Linux (or other Unixes) and thus knowledge of Linux to administer your 
workstation is needed.

The Bioinformatics Unit is one of the leading bioinformatics group in Spain, 
part of CNIO, one of the main cancer research institutes in Spain. We have 
developed a set of widely used web-based microarray data analysis tools, and 
have extensive computational facilities, including two computing clusters 
(with x86s and opterons) that use MPI and OpenMosix. You can check more of 
what we do at our group webpage (http://bioinfo.cnio.es) and my own web page 
(http://ligarto.org/rdiaz).


For further details please email Ramón Díaz-Uriarte at [EMAIL PROTECTED]

Best,

--
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)

---

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] building from source after installing binary package

2005-05-06 Thread Ramon Diaz-Uriarte

Dear Uwe,

Yes, sure, I understand how to install to another directory. I think I was not 
very clear, because my doubt is whether I should do that, or if it is OK to 
install to the very same place where Debian left the previous installation. 
By doing the later I save myself having to reinstall packages, etc, etc.

R.

On Friday 06 May 2005 08:53, Uwe Ligges wrote:
 Diaz.Ramon wrote:
  Dear All,
 
  I've got into the habit of installing R from the precompiled Debian
  binaries, including many of the packages from the r-cran-* Debian
  packages, and later building from source (e.g., to link against Goto's
  BLAS, or to build patched versions, etc). I install the newly built R to
  the very same place (/usr/lib/R). This allows me to build and update R
  when I wish, AND provides the ease of quickly updating many packages.
 
  Things have always worked fine, but after a few funny problems (which
  could be unrelated to the process itself) I've started wondering if this
  is a rather silly thing to do, and if I should keep my own build separate
  from the Debian stuff. Any advice would be much appreciated.

 Yes, simply install to another directory, e.g. by telling configure:

 ./configure --prefix=/I/want/to/have/R/installed/here


 Uwe Ligges

  Thanks,
 
  R.
  --
  Ramn Daz-Uriarte
  Bioinformatics Unit
  Centro Nacional de Investigaciones Oncolgicas (CNIO)
  (Spanish National Cancer Center)
  Melchor Fernndez Almagro, 3
  28029 Madrid (Spain)
  Fax: +-34-91-224-6972
  Phone: +-34-91-224-6900
 
  http://ligarto.org/rdiaz
  PGP KeyID: 0xE89B3462
  (http://ligarto.org/rdiaz/0xE89B3462.asc)
 
 
 
 
 
 
 
 
 
  **NOTA DE CONFIDENCIALIDAD** Este correo electrnico, y en su caso los
  ficheros adjuntos, pueden contener informacin protegida para el uso
  exclusivo de su destinatario. Se prohbe la distribucin, reproduccin o
  cualquier otro tipo de transmisin por parte de otra persona que no sea
  el destinatario. Si usted recibe por error este correo, se ruega
  comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY
  NOTICE** This email communication and any attachments may contain
  confidential and privileged information for the sole use of the
  designated recipient named above. Distribution, reproduction or any other
  use of this transmission by any party other than the intended recipient
  is prohibited. If you are not the intended recipient please contact the
  sender and delete all copies.
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html

-- 
Ramn Daz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncolgicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernndez Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)




**NOTA DE CONFIDENCIALIDAD** Este correo electrnico, y en su caso los ficheros 
adjuntos, pueden contener informacin protegida para el uso exclusivo de su 
destinatario. Se prohbe la distribucin, reproduccin o cualquier otro tipo de 
transmisin por parte de otra persona que no sea el destinatario. Si usted 
recibe por error este correo, se ruega comunicarlo al remitente y borrar el 
mensaje recibido. 
**CONFIDENTIALITY NOTICE** This email communication and any attachments may 
contain confidential and privileged information for the sole use of the 
designated recipient named above. Distribution, reproduction or any other use 
of this transmission by any party other than the intended recipient is 
prohibited. If you are not the intended recipient please contact the sender and 
delete all copies.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] building from source after installing binary package

2005-05-06 Thread Ramon Diaz-Uriarte

On Friday 06 May 2005 09:48, Prof Brian Ripley wrote:
 On Fri, 6 May 2005, Uwe Ligges wrote:
  Diaz.Ramon wrote:
  Dear All,
 
  I've got into the habit of installing R from the precompiled Debian
  binaries, including many of the packages from the r-cran-* Debian
  packages, and later building from source (e.g., to link against Goto's
  BLAS, or to build patched versions, etc). I install the newly built R to
  the very same place (/usr/lib/R). This allows me to build and update R
  when I wish, AND provides the ease of quickly updating many packages.
 
  Things have always worked fine, but after a few funny problems (which
  could be unrelated to the process itself) I've started wondering if this
  is a rather silly thing to do, and if I should keep my own build
  separate from the Debian stuff. Any advice would be much appreciated.
 
  Yes, simply install to another directory, e.g. by telling configure:
 
  ./configure --prefix=/I/want/to/have/R/installed/here

 I don't think that is the point: Ramon must have done that as the default
 installation place is /usr/local/lib/R.

Yes, I did change the --prefix because Debian installs to /usr/lib.

 I think this is a Debian-specific question (there is a R-debian list) and
 the point may be to make use of the binary Debian packages.  I would

Yes, that is correct (I guess I was not being very clear... too late last 
night). I'll ask in the debian list; I asked here just in case people with 
other GNU/Linux distributions did (or did not) do similar things.

 advocate installing R from the sources into /usr/local, and having
 separate directory trees both for packages you install and for Debian
 packages.  Then you can manipulate which packages are seen via R_LIBS.

Thanks. I'll try that.

Best,

R.
-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)




**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros 
adjuntos, pueden contener información protegida para el uso exclusivo de su 
destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de 
transmisión por parte de otra persona que no sea el destinatario. Si usted 
recibe por error este correo, se ruega comunicarlo al remitente y borrar el 
mensaje recibido. 
**CONFIDENTIALITY NOTICE** This email communication and any attachments may 
contain confidential and privileged information for the sole use of the 
designated recipient named above. Distribution, reproduction or any other use 
of this transmission by any party other than the intended recipient is 
prohibited. If you are not the intended recipient please contact the sender and 
delete all copies.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values

2005-04-26 Thread Ramon Diaz-Uriarte

Dear Giordano,

Library Hmisc, by Frank Harrell, contains several functions for imputation 
which I have found extremely useful.

Best,

R.



On Tuesday 26 April 2005 11:58, Giordano Sanchez wrote:
 Hello,

 Thanks for the instructive responses. But two questions arise.
 Firstable I can't manage to load the library mice.
 I'm using R 2.0.1 on my Debian
I try just copying the package in my library /usr/lib/R/library .
 but when i do library()
 ...
 mice   ** No title available (pre-2.0.0
 install?) **
 ...
 and when i do  library(mice)
Error in library(mice) : 'mice' is not a valid
 package --installed  2.0.0?


 The second question is more statistical:
 aregImpute() seems to give good results but i would like to compare the
 different methods not just graphically. It'is possible?
 I also have other meteorological stations that have correleted data with
 the data station I'm using? Can I use those data to improve my imputation
 method.

 Regards,

 Giordano

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)




**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros 
adjuntos, pueden contener información protegida para el uso exclusivo de su 
destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de 
transmisión por parte de otra persona que no sea el destinatario. Si usted 
recibe por error este correo, se ruega comunicarlo al remitente y borrar el 
mensaje recibido. 
**CONFIDENTIALITY NOTICE** This email communication and any attachments may 
contain confidential and privileged information for the sole use of the 
designated recipient named above. Distribution, reproduction or any other use 
of this transmission by any party other than the intended recipient is 
prohibited. If you are not the intended recipient please contact the sender and 
delete all copies.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] cross validation and parameter determination

2005-04-20 Thread Ramon Diaz-Uriarte

On Wednesday 20 April 2005 00:17, array chip wrote:
 Hi all,

 In Tibshirani's PNAS paper about nearest shrunken
 centroid analysis of microarrays (PNAS vol 99:6567),
 they used cross validation to choose the amount of
 shrinkage used in the model, and then test the
 performance of the model with the cross-validated
 shrinkage in separate independent testing set. If I
 don't have the luxury of having independent testing
 set, can I just use the cross validation performance
 as the performance estimate? In other words, can I use
 the same single cross-validation to both choose the
 value of the parameter (amount of shrinkage in this
 case) and estimate the performance which was based on
 the value of the parameter chosen by the same
 cross-validation? I kind of feel awkward by getting
 both on a single cross validation, because it seems
 like I used the dataset in training set manner. Am I
 wrong/right?


That error rate is probably optimistic, because as you say
 cross-validation? I kind of feel awkward by getting
 both on a single cross validation, because it seems
 like I used the dataset in training set manner. Am I

However, you can easily wrap the whole pam procedure within an outer-loop of 
cross validation or bootstrap. (This problem is not that different from, say, 
using knn and selecting k using cross-validation; or selecting the number of 
genes to use with cross-validation, etc. You should then assess the error 
rate of your procedure).

R.

 Thanks!

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)




**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros 
adjuntos, pueden contener información protegida para el uso exclusivo de su 
destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de 
transmisión por parte de otra persona que no sea el destinatario. Si usted 
recibe por error este correo, se ruega comunicarlo al remitente y borrar el 
mensaje recibido. 
**CONFIDENTIALITY NOTICE** This email communication and any attachments may 
contain confidential and privileged information for the sole use of the 
designated recipient named above. Distribution, reproduction or any other use 
of this transmission by any party other than the intended recipient is 
prohibited. If you are not the intended recipient please contact the sender and 
delete all copies.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] lme: error message with random=~1

2005-01-07 Thread Ramon Diaz-Uriarte

On Wednesday 05 January 2005 16:29, Thomas Petzoldt wrote:
 Douglas Bates wrote:
  I'm not sure what model you want to fit here.  To specify a random
  effect in lme you need both a grouping factor and a model matrix.  The
  error message indicates that lme is unable to determine a grouping
  factor.  It would be correct syntax if you added a single level factor
  to the data frame and used that but then the model fit would fail
  because you would be trying to estimate a variance in a model where
  there is no variation in the term.

 O.k. I see and think I understand it.

  It seems to me that you are trying to estimate parameters in a
  mixed-effects model without any random effects and lme can't do that.

 Yes, what I want is a model without any random effects to be tested
 against a model with random effects. I want to show, that the random
 effects are negligible but that we account for pseudo replicates and
 have tested this explicitely.

Dear Tomas,

What about fitting the models with and without random effects using the gls 
function (instead of lme ---you'll need to change a bit the syntax in the 
model with random effects in lm), and using a LR test? 

R.

 I'm not sure what is better: to leave the random effects in the model or
   simply an LR test against a linear model fitted by lm. I've never seen
 such an example in the books. Or have I missed a global alternative here?

 Thomas P.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramn Daz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncolgicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernndez Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)


Este correo electronico y, en su caso, cualquier fichero anexo al mismo, 
contiene informacion exclusivamente dirigida a su destinatario o destinatarios. 
Si Vd. ha recibido este mensaje por error, se ruega notificar esta 
circunstancia al remitente. Las ideas y opiniones manifestadas en este mensaje 
corresponden unicamente a su autor y no representan necesariamente a las del 
Centro Nacional de Investigaciones Oncologicas (CNIO).


The information contained in this message is intended for the addressee only. 
If you have received this message in error or there are any problems please 
notify the originator. Please note that the Spanish National Cancer Centre 
(CNIO), does not accept liability for any statements or opinions made which are 
clearly the sender's own and not expressly made on behalf of the Centre.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] LDA with previous PCA for dimensionality reduction

2004-11-26 Thread Ramon Diaz-Uriarte

Dear Cristoph, David, Torsten and Bjørn-Helge,

I think that Bjørn-Helge has made more explicit what I had in mind (which I 
think is close also to what David mentioned). As well, at the very least, not 
placing the PCA inside the cross-validation will underestimate the variance 
in the predictions.

Best,

R.


On Thursday 25 November 2004 15:05, Bjørn-Helge Mevik wrote:
 Torsten Hothorn writes:
  as long as one does not use the information in the response (the class
  variable, in this case) I don't think that one ends up with an
  optimistically biased estimate of the error

 I would be a little careful, though.  The left-out sample in the
 LDA-cross-validation, will still have influenced the PCA used to build
 the LDA on the rest of the samples.  The sample will have a tendency
 to lie closer to the centre of the complete PCA than of a PCA on the
 remaining samples.  Also, if the sample has a high leverage on the
 PCA, the directions of the two PCAs can be quite different.  Thus, the
 LDA is built on data that fits better to the left-out sample than if
 the sample was a completely new sample.

 I have no proofs or numerical studies showing that this gives
 over-optimistic error rates, but I would not recommend placing the PCA
 outside the cross-validation.  (The same for any resampling-based
 validation.)

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] LDA with previous PCA for dimensionality reduction

2004-11-24 Thread Ramon Diaz-Uriarte

Dear Cristoph,

I guess you want to assess the error rate of a LDA that has been fitted to a 
set of currently existing training data, and that in the future you will get 
some new observation(s) for which you want to make a prediction.
Then, I'd say that you want to use the second approach. You might find that 
the first step turns out to be crucial and, after all, your whole subsequent 
LDA is contingent on the PC scores you obtain on the previous step. Somewhat 
similar issues have been discussed in the microarray literature. Two 
references are:


@ARTICLE{ambroise-02,
  author = {Ambroise, C. and McLachlan, G. J.},
  title = {Selection bias in gene extraction on the basis of microarray 
gene-expression data},
  journal = {Proc Natl Acad Sci USA},
  year = {2002},
  volume = {99},
  pages = {6562--6566},
  number = {10},
}


@ARTICLE{simon-03,
  author = {Simon, R. and Radmacher, M. D. and Dobbin, K. and McShane, L. M.},
  title = {Pitfalls in the use of DNA microarray data for diagnostic and 
prognostic classification},
  journal = {Journal of the National Cancer Institute},
  year = {2003},
  volume = {95},
  pages = {14--18},
  number = {1},
}


I am not sure, though, why you use PCA followed by LDA. But that's another 
story.

Best,


R.

On Wednesday 24 November 2004 11:16, Christoph Lehmann wrote:
 Dear all, not really a R question but:

 If I want to check for the classification accuracy of a LDA with
 previous PCA for dimensionality reduction by means of the LOOCV method:

 Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA
 with the CV option set to TRUE (runs LOOCV)

 -- OR--

 do I need
 - to compute for each 'test-bag' (the n-1 observations) a PCA
 (my.princomp.1),
 - then run the LDA on the test-bag scores (- my.lda.1)
 - then compute the scores of the left-out-observation using
 my.princomp.1 (- my.scores.2)
 - and only then use predict.lda(my.lda.1, my.scores.2) on the scores of
 the left-out-observation

 ?
 I read some articles, where they choose procedure 1, but I am not sure,
 if this is really correct?

 many thanks for a hint

 Christoph

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] t test problem?

2004-09-23 Thread Ramon Diaz-Uriarte

On Wednesday 22 September 2004 13:07, Ted Harding wrote:
 On 22-Sep-04 kan Liu wrote:
  Hi, Many thanks for your helpful comments and suggestions. The attached
  are the data in both log10 scale and original scale. It would be very
  grateful if you could suggest which version of test should be used.
 
  By the way, how to check whether the variation is additive (natural
  scale) or multiplicative (log scale) in R? How to check whether the
  distribution of the data is normal?

 As for additive vs multiplicative, this can only be judged in terms
 of the process by which the values are created in the real world.


Just my 2 cents: I often find it helpful to ask myself (or the client) 
whether, if there was a difference (something) between the two samples, 
I/she/he thinks the appropriate model is (please, read the = as approx. 
equal)

sample.1 = sample.2 + something [1]

OR

sample.1 = sample.2 * something [2]

(i.e., the ratio of means is a constant: sample.1/sample.2 = something)

which, by log transforming becomes

log(sample.1) = log(sample.2) + log(something)

I am not including here the issue of error distribution, but often times when 
the model for the means is like [2] the error terms are multiplicative (i.e., 
additive in the log scale). At least in many biological and engineering 
problems it is often evident whether [1] or [2] should be appropriate for the 
data, given what we know about the subject.

Best,

R.

 As for normality vs non-normality, an appraisal can often be made
 simply by looking at a histogram of the data.

 In your case, the commands
   hist(x,breaks=1*(0:100))
   hist(y,breaks=1*(0:100))
 indicate that the distributions of x and y do not look at all
 normal, since they both have considerable positive skewness
 (i.e. long upper tails relative to the main mass of the distribution).

 This does strongly suggest that a logarithmic transformation would
 give data which are more nearly normally distributed, as indeed
 is confirmed by the commands
   hist(log(x))
   hist(log(y))
 though in both cases the histograms show some irregularity compared
 with what you would expect from a sample from a normal distribution:
 the commands
   hist(log(x),breaks=0.2*(40:80))
   hist(log(y),breaks=0.2*(40:80))
 show that log(x) has an excessive peak at around 11.7,
 while log(y) has holes at around 11.1 and 12.1.

 Nevertheless, this inspection of the data shows that the use of
 log(x) and log(y) will come much closer to fulfilling the conditions
 of validity of the t test than using the raw data x and y.

 However, it is not merely the *normality* of each which is needed:
 the conditions for the usual t test also require that the two
 populations sampled for log(x) and log(y) should have the same
 standard deviations. In your case, this also turns out to be

 nearly enough true:
sd(log(x))

   [1] 0.902579

sd(log(y))

   [1] 0.9314807

  PS, Can I confirm that do your suggestions mean that in order to check
  whether there is a difference between x and y in terms of mean I need
  check the distribution of x and that of y in both natual and log scales
  and to see which present normal distribution?

 See above for an approach to this: the answer to your question is,
 in effect, yes. It could of course have happened that neither the
 raw nor the log scale would be satisfactory, in which case you would
 need to consider other possibilities. And, if the SDs had turned out
 to be very different, you should not use the standard t test but
 a variant which is adpated to the situation (e.g. the Welch test).

 You can, of course, also perform formal tests for skewness, for
 normality, and for equality of variances.

 Best wishes,
 Ted.


 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861   [NB: New number!]
 Date: 22-Sep-04   Time: 12:07:07
 -- XFMail --

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] can't understand R

2004-09-21 Thread Ramon Diaz-Uriarte

Dear Erin,


On Tuesday 21 September 2004 06:10, Erin L. Leisz wrote:
 hi.  i really need help using this program.  computer language is a foreign
 language to me, and thus, i cannot make heads nor tails of the user manuals
 from the website.  i need to locate step-by-step examples of simple

If you plan to use R more than once, I think you probably want to get used to 
using the manuals (starting with An introduction to R, and maybe some of 
the other intro material available from the R web site).

 problems such as graph f(x)+g(x) and f(g(x)) for the domain 0x2 and
 graph 2H(x), H(x)+1, H(x+1)  i do know how to define the functions, but
 that's it.  is there any help you could provide me?  i would appreciate
 some help asap.  thank you very much

For this particular case of plotting f(x), you can take a look at the function 
curve (do:
?curve
at the R prompt).

Hope this helps.

R.


 erin leisz

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] degrees of freedom (lme4 and nlme)

2004-09-09 Thread Ramon Diaz-Uriarte

Dear Elizabeth,

When I looked for this a couple of years ago, I found DF's to be discussed in 
the book by Pinheiro  Bates Mixed effects models for S and S-Plus, as well 
as the documentation for SAS's PROC MIXED (I believe that the discussion on 
df's on the SAS manual was more complete than on the SAS system for mixed 
models book ---and I think html versions of the manuals for v 8 of SAS can 
be found on the web). I do not remember specifically, though, whether this 
discussions mentioned explicitly DFs for fixed effects with crossed random 
effects (I do not have the references here now).

Best,

R.

On Wednesday 08 September 2004 19:54, Elizabeth Lynch wrote:
 Hi,

 I'm looking for pointers/references on calculating den DF's for fixed
 effects when using crossed random effects. Also, is there an implementation
 of simulate.lme that I could use in lme4?

 Thanks,

 Elizabeth Lynch

 Douglas Bates wrote:
 Alexandre Galvão Patriota wrote:
 Hi, I'm having some problems regarding the packages
 lme4 and nlme, more specifically in the denominator
 degrees of freedom. SNIP
 
 The lme4 package is under development and only has a stub for the code
  that calculates the denominator degrees of freedom.
 
 These Wald-type tests using the F and t distributions are approximations
  at best.  In that sense there is no correct degrees of freedom.  I
  think the more accurate tests may end up being the restricted likelihood
  ratio tests that Greg Reinsel and his student Mr. Ahn were working on at
  the time of Greg's death.
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

 hthttp://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] bootstrap: stratified resampling

2004-06-08 Thread Ramon Diaz-Uriarte

Dear All,

I was writing a small wrapper to bootstrap a classification algorithm, but if 
we generate the indices in the usual way as:

bootindex - sample(index, N, replace = TRUE)

there is a non-zero probability that all the samples belong to only 
one class, thus leading to problems in the fitting (or that some classes will 
end up with only one sample, which will be a problem for quadratic 
discriminant analysis).

It thought this situation should be frequent enough to be mentioned in the 
literature, but I have found almost no mention in the references I have 
available, except for Hirst (see below). If I've reread correctly, this issue 
is not mentioned in Efron  Tibshirani (1997; the .632+ paper), or in Efron 
and Gong (the TAS leisure look paper), or the Efron  Tibshirani 1993 
bootstrap book, or Chernick's Bootstrap methods book. I've only seen some 
side mentions in Ripley's Pattern recognition (when talking about stratified 
cross-validation), and Davison  Hinkley's bootstrap book when, on p. 304, 
they refer to some subsets having singular design matrices, and thus 
requiring stratification on covars. McLachlan (in his discriminant analysis 
book), on p. 347, differentiates between mixture sampling and separate 
sampling, but I can find a mention of what do when, under mixture sampling, we 
end up with all samples in only one group.

Only Hirst (1996, Technometrics, 38 (4): 389--399) says that each bootstrap 
sample should include at least one observation for each group, and at least 
enough different observations from each group to allow estimation of the 
covariance matrix (he is referring to discriminant analysis), and thus he 
uses essentially stratified bootstrap samples.

Interestingly, the boot function (boot library) says For nonparametric 
multi-sample problems stratified resampling is used.. As well, the 
predab.resample (Design library) says  group: a grouping variable used to 
stratify the sample upon bootstrapping. This allows one to handle k-sample 
problems, (...).

That the authors of boot and Design are using stratified resampling indicates 
to me that this might be the obvious, unproblematic way to go, but I 
understood that stratified resampling was OK only when that was sampling 
scheme that generated the data.  

What am I missing?

Thanks,

R.


-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz
PGP KeyID: 0xE89B3462
(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] citing a package?

2004-02-09 Thread Ramon Diaz-Uriarte

Dear Martin,

I'd suggest you check the DESCRIPTION file and ask the author(s) of the 
package (e.g., a package might be related to a tech report which might, now, 
be in press, or whatever).

Best,

R.


On Monday 09 February 2004 15:21, Martin Henry H. Stevens wrote:
 How do I cite a package (not R itself - I know how to do that)? Any
 thoughts or links?
 Many thanks in advance?
 Hank Stevens

 Dr. Martin Henry H. Stevens, Assistant Professor
 338 Pearson Hall
 Botany Department
 Miami University
 Oxford, OH 45056

 Office: (513) 529-4206
 Lab: (513) 529-4262
 FAX: (513) 529-4243
 http://www.cas.muohio.edu/botany/bot/henry.html
 http://www.muohio.edu/ecology/
 http://www.muohio.edu/botany/

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz
PGP KeyID: 0xE89B3462
(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] SIR

2004-01-16 Thread Ramon Diaz-Uriarte

This is strange; the sir for R I know (in package dr on CRAN, from S. 
Weisberg), last time I checked (about a year ago?) was able to handle 
multivariate responses. In fact, p. 6 of the documentation shows an example 
of SIR with a bivariate response, and I tried it, and it works.

Best,

R.

On Friday 16 January 2004 10:04, hagric wrote:
 I have found a version of SIR in R and I have tried it. But the problem
 with this file is the fact that it does not cope with multivariate
 response variables. Is there any version of SIR available that also
 works with multivariate responses?
 Thanks for help!

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz
PGP KeyID: 0xE89B3462
(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] help in lme

2003-12-15 Thread Ramon Diaz-Uriarte

Since Spencer Graves already answered the factorial questions, I'll try to 
answer one of the other two:

On Monday 15 December 2003 05:17, [EMAIL PROTECTED] wrote:
 To anyone who can help,

 Intelligent question (1)  I keep on trying to fit a linear mixed model in R
 using 'lme(y~fxd.dsgn, data = data.mtrx, ~rnd.dsgn|group)' where fxd.dsgn
 and rnd.dsgn are the fixed and random design matrices, respectively.   The
 function won't work, though.   It keeps telling me that it can't find the
 object 'rnd.dsgn'.What's the matter here?

Is rnd.dsgn a variable in data.mtrx? That is how I always fit lme models, 
and never encountered the problem you describe.

R.

P.S. Stupid question # 2 I think has been asked (and answered) several times 
in this list in the past.


 Any help would be greatly appreciated.

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help



-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz
PGP KeyID: 0xE89B3462
(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] typeIII SS for lme?

2003-12-12 Thread Ramon Diaz-Uriarte

Dear Bill,

You can obtain marginal tests using

anova(your.lme.object, type = marginal)

(If you are going to compare output, note that marginal tests when using 
non-orthogonal contrasts ---SAS and treatment--- might give you unexpected 
results, last time I checked).

R.


On Thursday 11 December 2003 19:40, Bill Shipley wrote:
 To avoid angry replies, let me first say that I know that the use of
 Type III sums of squares is controversial, and that some statisticians
 recommend instead that significance be judged using the non-marginal
 terms in the ANOVA.  However, given that type III SS is also demanded by
 some  is there a function (equivalent to drop1 for lm) to obtain type
 III sums of squares for mixed models using the lme function?



 Bill Shipley

 Associate Editor, Ecology

 North American Editor, Annals of Botany

 Département de biologie, Université de Sherbrooke,

 Sherbrooke (Québec) J1K 2R1 CANADA

 [EMAIL PROTECTED]

  http://callisto.si.usherb.ca:8080/bshipley/
 http://callisto.si.usherb.ca:8080/bshipley/




   [[alternative HTML version deleted]]

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz
PGP KeyID: 0xE89B3462
(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] documentation typo in coxph?

2003-10-06 Thread Ramon Diaz-Uriarte

Dear All,

I think there is a typo in the documentation for coxph (library survival).
The help says:


 eps: convergence threshold.  Iteration will continue until the
  relative change in the log-likelihood is less than eps. 
  Default is .0001. 

However, if I do coxph.control() I get:
 coxph.control()
$eps
[1] 1e-09

So the actual eps being used is not 10-4 but 10-9. 

Best,

Ramón


 version
 _
platform i386-pc-linux-gnu
arch i386 
os   linux-gnu
system   i386, linux-gnu  
status
major1
minor7.1  
year 2003 
month06   
day  16   
language R
 
-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] simplifying randomForest(s)

2003-09-16 Thread Ramon Diaz-Uriarte

Dear All,

I have been using the randomForest package for a couple of difficult 
prediction problems (which also share p  n). The performance is good, but 
since all the variables in the data set are used, interpretation of what is 
going on is not easy, even after looking at variable importance as produced 
by the randomForest run.

I have tried a simple variable selection scheme, and it does seem to perform 
well (as judged by leave-one-out) but I am not sure if it makes any sense.  
The idea is, in a kind of backwards elimination,  to eliminate one by one the 
variables with smallest importance (or all the ones with negative importance 
in one go) until the out-of-bag estimate of classification error becames 
larger than that of the previous model (or of the initial model). So nothing 
really new. But I haven't been able to find any comments in the literature 
about simplification of random forests. 

Any suggestions/comments?

Best,

Ramón

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] simplifying randomForest(s)

2003-09-16 Thread Ramon Diaz-Uriarte

Dear Andy,

Thanks a lot for your message.

 This is quite a hazardous game.  We've been burned by this ourselves.  I'll
 send you a paper we submitted on variable selection for random forest
 off-line.  (Those who are interested, let me know.)

Thanks!


 The basic problem is that when you select important variables by RF and
 then re-run RF with those variables, the OOB error rate become biased
 downward. As you iterate more times, the overfitting becomes more and
 more severe (in the sense that, the OOB error rate will keep decreasing
 while error rate on an independent test set will be flat or increases).  I
 was naïve enough to ask Breiman about this, and his reply was something
 like any competent statistician would know that you need something like
 cross-validation to do that...

Yes, I understand the points you are making. However, I have tried to achieve 
protection against this problem by assessing the leave-one-out 
cross-validation error (LOOCVE) of the complete selection process. And the 
LOOCVE suggests this is working. Within the variable selection routine the 
OOB error rate is biased, but I guess that does not concern me that much, 
because I only use it to guide the selection. However, my final estimate of 
error comes from the LOOCVE.

This is the esqueleton of the alorithm:

n - length(y)

for(i in 1:n) {
the.simple.rf - simplify.the.rf(data = data[-i, ])
prediction[i] - predict(the.simple.rf, newdata = data[i, ])
}
loocve - sum(y != prediction) / n

Thus, the LOOCVE is computed with observations that were never used for the 
simplification of the tree that is predicting them.

[I'll be glad to send my code to anyone interested].

And, the interesting thing with the data set I have tried is that it seems to 
perform reasonably (actually, the LOOCVE of a tree with the reduced set of 
variables is smaller than the LOOCVE of the original tree).

(This is a first shot. I have a small sample size (29) so LOOCV is not that 
bad in terms of computation, although I am aware it can have high variance. I 
guess I could try the .632+ bootstrap method).



Best,

Ramón




 Best,
 Andy

  Any suggestions/comments?
 
  Best,
 
  Ramón
 
  --
  Ramón Díaz-Uriarte
  Bioinformatics Unit
  Centro Nacional de Investigaciones Oncológicas (CNIO)
  (Spanish National Cancer Center)
  Melchor Fernández Almagro, 3
  28029 Madrid (Spain)
  Fax: +-34-91-224-6972
  Phone: +-34-91-224-6900

 http://bioinfo.cnio.es/~rdiaz

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

 ---
--- Notice:  This e-mail message, together with any attachments, contains
 information of Merck  Co., Inc. (Whitehouse Station, New Jersey, USA),
 and/or its affiliates (which may be known outside the United States as
 Merck Frosst, Merck Sharp  Dohme or MSD) that may be confidential,
 proprietary copyrighted and/or legally privileged, and is intended solely
 for the use of the individual or entity named on this message.  If you are
 not the intended recipient, and have received this message in error, please
 immediately return this by e-mail and then delete it.
 ---
---

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] coxph.control

2003-08-26 Thread Ramon Diaz-Uriarte

Dear Gareth,

?coxph.control (which we are told to check from ?coxph) contains an argument 
for maxiter.

Best,

R.


On Tuesday 26 August 2003 13:51, Gareth Hughes wrote:
 How can I specify the maximum number of iterations in coxph
 whilst also specifying my model?? I can't find any on-line
 examples. Thanks

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

74 matches

Mail list logo