Re: [R] Finding convex hull? [Broadcast]

2007-09-07 Thread Liaw, Andy
From: Dong-hyun Oh
 Dear UseRs,
 
 I would like to know which function is the most efficient in finding  
 convex hull of points in 3(or 2)-dimensional case?
 
 Functions for finding convex hull is the following:
 convex.hull (tripack), chull (grDevices), in.chull (sgeostat),  
 convhulln (geometry), convexhull.xy (spatstat), calcConvexHull  
 (PBSmapping).
 
 I also would like to know if there is a function that can be 
 used for  
 finding convex hull in multi-dimensional case, that is more than 3- 
 dimension.

If you had look a bit more carefully, you should have seen that 
convhulln (geometry) will handle more than 3 dimensions.

Andy

 Thank you in advance.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Q: loess-like function that allows more predictors?

2007-09-07 Thread Liaw, Andy
locfit() in the locfit package can do that.

Andy 

From: D. R. Evans
 
 I have a feeling that this may be a stupid question, but here 
 goes anyway:
 is there a function that I can use to replace loess but which allows a
 larger number of predictors?
 
 (I have a situation in which it would be very convenient to use 5
 predictors, which violates the constraint in loess that the number of
 predictors be in the range from 1 to 4.)
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Monotonic interpolation

2007-09-06 Thread Liaw, Andy
Not if Mr. excalibur really want interpolating (as oppose to 
smooting) splines.  Other than linear, I'm not even sure if 
it can be done (though I'm no expert on this).

One possibility is to use the cobs package and play with
the amount of smoothing...

Andy

From: Bert Gunter
 
 RSiteSearch(monotone, restr=func) will give you several 
 packages and
 functions for monotone smoothing, including the isoreg() 
 function in the
 standard stats package.  You can determine if any of these 
 does what you
 want.
 
 
 Bert Gunter
 Genetech Nonclinical Statistics
 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of excalibur
 Sent: Thursday, September 06, 2007 8:04 AM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] Monotonic interpolation
 
 
 
 
 Le jeu. 6 sept. à 09:45, excalibur a écrit :
 
 
  Hello everybody, has anyone got a function for smooth monotonic  
  interpolation
  (splines ...) of a univariate function (like a distribution  
  function for
  example) ?
 
 approxfun() might be what your looking for.
 
 Is the result of approxfun() inevitably monotonic ?
 -- 
 View this message in context:
 http://www.nabble.com/Monotonic-interpolation-tf4392288.html#a12524568
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recursive concatenation

2007-09-06 Thread Liaw, Andy
Or something like:

R do.call(paste, c(expand.grid(LETTERS[1:3], 1:3), sep=))
[1] A1 B1 C1 A2 B2 C2 A3 B3 C3

(The ordering is bit different, but that shouldn't matter.)

Andy 

From: Dimitris Rizopoulos
 try this:
 
 paste(rep(LETTERS[1:3], each = 3), 1:3, sep = )
 
 
 Best,
 Dimitris
 
 
 Dimitris Rizopoulos
 Ph.D. Student
 Biostatistical Centre
 School of Public Health
 Catholic University of Leuven
 
 Address: Kapucijnenvoer 35, Leuven, Belgium
 Tel: +32/(0)16/336899
 Fax: +32/(0)16/337015
 Web: http://med.kuleuven.be/biostat/
   http://www.student.kuleuven.be/~m0390867/dimitris.htm
 
 
 Quoting Dennis Fisher [EMAIL PROTECTED]:
 
  Colleagues,
 
  I want to create the following array:
  A1, A2, A3, B1, B2, B3, C1, C2, C3
 
  I recall that there is a trick using c or paste permitting me to
  form all combinations of c(A, B, C) and 1:3.  But, I can't
  recall the trick.
 
  Dennis
 
 
  Dennis Fisher MD
  P  (The P Less Than Company)
  Phone: 1-866-PLessThan (1-866-753-7784)
  Fax: 1-415-564-2220
  www.PLessThan.com
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 
 Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Synchronzing workspaces

2007-09-06 Thread Liaw, Andy
See the example in ?save on how to set defaults via options().

Andy

From: Gabor Grothendieck
 
 You could try saving prior to quitting in the future if you 
 want to try
 those arguments.
 
 On 9/3/07, Paul August [EMAIL PROTECTED] wrote:
  Thanks for sharing your experience. In my case, the 
 involved machines are Windows Vista, XP and 2000. Not sure 
 whether it contributes to my problem or not. I will look into 
 this further.
 
  I just noticed the two arguments ascii and compress for 
 save. However, my .RData file was created by q() with yes. 
 The manual says that q() is equivalent to save(list = 
 ls(all=TRUE), file = .RData). There seems to be no way to 
 set ascii or compression of save through q function, unless 
 the q function is replaced explicitly with save(list = 
 ls(all=TRUE), file = .RData, ascii = T).
 
  Paul.
 
 
  - Original Message 
  From: Gabor Grothendieck [EMAIL PROTECTED]
  To: Paul August [EMAIL PROTECTED]
  Cc: r-help@stat.math.ethz.ch
  Sent: Thursday, August 30, 2007 11:24:31 PM
  Subject: Re: [R] Synchronzing workspaces
 
  I haven't had similar experience but note that save has ascii=
  and compress= arguments.  You could check if varying those
  parameter values makes a difference.
 
  On 8/30/07, Paul August [EMAIL PROTECTED] wrote:
   I used to work on several computers and to use a flash 
 drive to synchronize the workspace on each machine before 
 starting to work on it. I found that .RData always caused 
 some trouble: Often it is corrupted even though there is no 
 error in copying process. Does anybody have the similar experience?
  
   Paul.
  
   - Original Message 
   From: Barry Rowlingson [EMAIL PROTECTED]
   To: Eric Turkheimer [EMAIL PROTECTED]
   Cc: r-help@stat.math.ethz.ch
   Sent: Wednesday, August 22, 2007 9:43:57 AM
   Subject: Re: [R] Synchronzing workspaces
  
   Eric Turkheimer wrote:
How do people go about synchronizing multiple 
 workspaces on different
workstations?  I tend to wind up with projects spread 
 around the various
machines I work on.  I find that placing the 
 directories on a server and
reading them remotely tends to slow things down.
  
If R were to store all its workspace data objects in 
 individual files
   instead of one big .RData file, then you could use a 
 revision control
   system like SVN.  Check out the data, work on it, check 
 it in, then on
   another machine just update to get the changes.
  
However SVN doesn't work too well for binary files - 
 conflicts being
   hard to resolve without someone backing down - so maybe 
 its not such a
   good idea anyway...
  
On unix boxes and derivatives, you can keep things in 
 sync efficiently
   with the 'rsync' command.  I think there are GUI addons 
 for it, and
   Windows ports.
  
   Barry
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
  
  
  
  
   
 __
 __
  
   Comedy with an Edge to see what's on, when.
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable Importance - Random Forest

2007-09-06 Thread Liaw, Andy
I'm slowly clearing my back-log of r-help messages...

Please see reply inline below.

Andy

 From: Mathe, Ewy (NIH/NCI) [F]
 Hello,
 
  
 
 I am trying to explore the use of random forests for 
 classification and
 am certain about the interpretation of the importance measurements.
 
  
 
 When having the option importance = T in the randomForest call, the
 resulting 'importance' element matrix has four columns with the
 following headings:
 
 0 - mean raw importance score of variable x for class 0 (where
 importance is the difference between the permutated data error and the
 original test set error)
 
 1 - mean raw importance score of variable x for class 1
 
 MeanDecreaseAccuracy : average lowering of the margin across all cases
 (where margin is the proportion of votes for the true class - the
 maximum proportion of votes for the other classes)
 
 MeanDecreaseGini : summation of the gini decreases over all 
 trees in the
 forest
 
  
 
 Are these definitions correct?  Why is the raw importance score
 calculated for each class?  Could one just average the raw importance
 scores for class 0 and 1 to get a composite importance score?

The permutation-based importance measures are based on OOB data.  For
each tree in the forest, the difference in error rates on the OOB data
with and without permuting the variable of interest is computed.  Call
this d[i] for the i-th tree.  The overall importance measure is
mean(d[i]) / se(d[i]), where se(d[i]) is sd(d[i])/sqrt(ntree) (the
standard error).  The numbers in the 0 and 1 columns are the
analogs computed separately for the 0 class and 1 class separately.
These are useful, e.g., when balanced sampling is used.
  
  
 
 Now, when having the option importance = F in the randomForest call,
 the 'importance' element is now a vector.  What values are those?

That's the MeanDecreaseGini, because they come at nearly zero additional
computation, so we might as well keep them.
 
  
 
 Thank you in advance for any input you may have.
 
  
 
 Best,
 
 Ewy
 
 
 Ewy Mathe, Ph. D.
 
 Laboratory of Human Carcinogenesis
 
 National Cancer Institute, NIH
 
 37 Convent Drive
 
 Building 37, Room 3068
 
 Bethesda, MD  20892-4255
 
 Tel: 301-496-5835
 
 Fax: 301-496-0497
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] categorical variable coefficients in QSAR [Broadcast]

2007-09-06 Thread Liaw, Andy
No one seemed to have picked up on this, so I'll take a stab:

You need to read para and meta into R as factors, and if you want the 
coefficients to match the way you showed, you also need to take care that the 
factor levels are in the same order as you showed in the coefficient table.

I cut-and-pasted the three columns of data into R separately, like so:

[copy para data to the clipboard]
R para - factor(scan(clipboard, what=))
Read 22 items
[copy meta data to the clipboard]
R meta - factor(scan(clipboard, what=))
Read 22 items
[copy biological activity to the clipboard]
R y - scan(clipboard)
Read 22 items
[copy the column heading of the coefficient table to the clipboard]
R lvl - scan(clipboard, what=)
Read 6 items
R para - factor(as.character(para), levels=lvl)
R meta - factor(as.character(meta), levels=lvl)
R qsar - lm(y ~ para + meta)
R qsar

Call:
lm(formula = y ~ para + meta)

Coefficients:
(Intercept)paraF   paraCl   paraBrparaI   paraMe  
 7.8213   0.3400   0.7675   1.0200   1.4287   1.2560  
  metaF   metaCl   metaBrmetaI   metaMe  
-0.3013   0.2068   0.4340   0.5787   0.4540  

These coefficients match the ones you showed quite closely.

If you don't reorder the levels of the factors, then by default R orders them 
alphabetically, so that Br becomes the reference and all coefficients are 
differences from Br.

HTH,
Andy


From: [EMAIL PROTECTED]
 Dear list:
 I am interested in the following sort of problem, as is found 
 frequently
 in the field of QSAR. I have biological activity as a 
 function of chemical
 structure, with structure defined in a categorical manner in that the
 SUBSTITUENT is the levels of the POSITION factor. For 
 example, data from
 Kubinyi (http://www.kubinyi.de/dd-12.pdf) for this type of analysis is
 presented as follows:
 factor para:
 H
 F
 Cl
 Br
 I
 Me
 H
 H
 H
 H
 H
 F
 F
 F
 Cl
 Cl
 Cl
 Br
 Br
 Br
 Me
 Me
 factor meta:
 H
 H
 H
 H
 H
 H
 F
 Cl
 Br
 I
 Me
 Cl
 Br
 Me
 Cl
 Br
 Me
 Cl
 Br
 Me
 Me
 Br
 observed biological activity:
 7.46
 8.16
 8.68
 8.89
 9.25
 9.30
 7.52
 8.16
 8.30
 8.40
 8.46
 8.19
 8.57
 8.82
 8.89
 8.92
 8.96
 9.00
 9.35
 9.22
 9.30
 9.52
 
 I then think the following analysis should be appropriate
 
 
 meta-factor(scan(file=meta,what=character))
 para-factor(scan(file=para,what=character))
 ba-scan(file=ba)
 
 rslt-lm(ba~meta+para-1)
 
 What I wish to obtain is a coefficient for each substituent at each
 position, as does Kubinyi:
 
 H F Cl Br I Me
 meta 0.00 -0.30 0.21 0.43 0.58 0.45
 para 0.00 0.34 0.77 1.02 1.43 1.26
 
 
 However, I do not get a coefficient for the Br substituent at the para
 position. I would like to know if there is an error in this 
 formulation.
 The technique is quite well established in the field of medicinal
 chemistry and it is traditional that the binary incidence 
 matrix is formed
 by hand as an intermediate step in the analysis, instead of the much
 simpler formulation that I am considering here.
 
 Thank you for whatever insight you may give.
 
 Prof. Roy Little
 Dept. Chem.
 Universidad de los Andes
 Mérida, Venezuela
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] randomForest help

2007-09-06 Thread Liaw, Andy
What software are you using, exactly?  I'm the maintainer of the
randomForest package, yet I do not know which manual you are quoting.

If you are using the randomForest package, the model object can be saved
to a file by save(Rfobject, file=myRFobject.rda).  If you need that to
be in ascii, use ascii=TRUE in save().  You can get it back into R by
using load() or attach().

To run data down the model, use predict(Rfobject, datatopredict) (see
?predict.randomForest).

What exactly do you want to print to a csv file, the prediction?  See
?write or ?write.table.

Andy

From: Jennifer Dawn Watts
 
 
 Hello!  As a new R user, I'm sure this will be a silly 
 question for the 
 rest of you.   I've been able to successfully run a forest but yet to 
 figure out proper command lines for the following: 
 1. saving the forest.  The guide just says isavef=1.  I'm unsure how 
 expand on this to create the command.
 2. Running new data down the mode.  Again, the guide just states irunf
 3. Print to file. I need to be able to export this data to a cvs 
 file, to then incorporate into an Arc shapefile.  The manual 
 just says 
 ntestout.
 
 Again, I feel like these should be easy steps that I just 
 can't relate 
 to as a beginner.  Any advice would be greatly appreciated. 
 
 Thanks,
 
 Jenny
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ecological meaning of randomForest vegetation classification? [Broadcast]

2007-09-05 Thread Liaw, Andy
Hi Christoph,

I'm not exactly sure what you're looking for, but I'll take a stab
anyway.

The trees in a random forest is not designed to be interpreted as one
would
with an ordinary tree.  There are several things you may try to see if
they help you any.  One is the distribution of votes.  It looks like you
are
classifying each data point into one of many possible classes.  RF with
give
you the fraction of trees in the forest that classified the observation
as
a particular class (and the class with the highest fraction of votes is
the
predicted class).  Another is the partial dependence plot:  You can
use
plot(importance(rf.object)) to see which variables are the most
important,
and then use partialPlot() to examine their marginal effects.  These
offer
some clue of what the RF black box is doing, and hopefully will make
some
sense to you.

Best,
Andy 

From: Christoph Muller
 
 Hi, everyone,
 
 I haven't found anything similar in the forum, so here's my 
 problem (I'm no
 expert in R nor statistics):
 
 I have a data set of 59.000 cases with 9 variables each (fractional
 coverage of 9 different plant types, such as deciduous broad-leaved
 temperate trees or evergreen tropical trees etc.), which was 
 generated by a
 vegetation model.
 In order to evaluate the quality of the vegetation model's 
 output, I want
 to compare it to a land-cover data set which has 23 different 
 land-cover
 types (such as needle leaved evergreen forest, dense 
 broad-leaved forest,
 barren, etc.).
 A statistician advised me to use the randomForest package in 
 R and using a
 sub-set to generate the random Forest, I get a very good 
 prediction for the
 rest.
 However, I need to evaluate how meaningful this 
 classification is in an
 ecological sense (boreal trees should not play a role in the 
 definition of
 tropical land-cover types, for example), otherwise I cannot judge the
 quality of the vegetation model's output.
 
 Unfortunately, randomForest gives me about 15.000 splits of 
 which about
 5000 are end branches (rough guess), so it's very hard and 
 time-consuming
 to check each single branch of one of the final trees for its 
 ecological
 meaning.
 Is there any utility to summarize the characteristics of each 
 of the 23
 prediction classes? Such as land-cover class 1 has less than 
 5% of plant
 types 1-5, 20-50% of plant type 7 and at least 30% of plant type 8.
 Or is there a more suitable method to classify my data?
 
 Thanks a lot in advance!
 
 Christoph
 __
 __
 
 Click on the following link for the Netherlands Environmental 
 Assessment
 Agency(MNP)mission and contact information:
 http://www.mnp.nl/signature.html
 
 Klik op de volgende link voor missie en contactinformatie van het
 Milieu- en Natuurplanbureau (MNP): http://www.mnp.nl/signature.html
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (Most efficient) way to make random sequences of random sequences

2007-08-21 Thread Liaw, Andy
Similarly:

s - c(replicate(N, sample(3)))

Andy 

From: roger koenker
 One way:
 
   N - 10
s - c(apply(matrix(rep(1:3,N),3,N),2,sample))
 
 
 url:www.econ.uiuc.edu/~rogerRoger Koenker
 email[EMAIL PROTECTED]Department of Economics
 vox: 217-333-4558University of Illinois
 fax:   217-244-6678Champaign, IL 61820
 
 
 On Aug 21, 2007, at 3:49 PM, Emmanuel Levy wrote:
 
  Hi,
 
  I was wondering the what would be the (most efficient) way 
 to generate
  a sequence
  of sequences, i mean:
 
  if I have 1,2 and 3.
 
  I'd like to generate a sequence of length N*3 (N ~ 
 1,000,000 or more)
 
  Where random permutations of the sequence 1,2,3 follow each other.
 
  i.e  1,2,3,1,3,2,3,2,1
 
  /!\ The thing is that there should never be twice the same number of
  in the same sub-sequence, meaning that this is different from
  generating a vector with the numbers 1,2 and 3 randomly distributed.
 
  Any suggestion very welcome! Thanks,
 
  Emmanuel
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting- 
  guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RFclustering - is it available in R?

2007-08-15 Thread Liaw, Andy
Basically the random forest algorithm can generate a proximity 
matrix of the data, and it's up to you how you would want to 
proceed from there.  You can feed that into clustering 
algorithms that accept a similarity matrix, or turn it into a 
distance matrix for clustering algorithms that need a distance
matrix (e.g., hclust()).  You may or may not want to do 
ordination as the UCLA folks suggest.

I think this is one of the great things about working in R:
you have the freedom to choose how you want to proceed from
some intermediate result, and not locked in to something some
one decide to hardwire into the software.

Andy

From: Gavin Simpson
 
 On Wed, 2007-08-15 at 09:44 -0700, David Katz wrote:
  Several searches turned up nothing. Perhaps I will try to 
 implement it if
  nobody else has. Thanks.
 
 You can do this with Andy Liaw's randomForest package can do this and
 the first hit on a Google search (on term RFclustering) was this:
 
 http://www.genetics.ucla.edu/labs/horvath/RFclustering/RFclust
 ering.htm
 
 which shows how one might go about this with some helper functions.
 
 G
 
 -- 
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  Gavin Simpson [t] +44 (0)20 7679 0522
  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
  Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loading JMP Files

2007-08-15 Thread Liaw, Andy
JMP can write CSV, and that's probably a safer choice than XPT.

Andy 

From: Diana C. Dolan
 
 Hi,
 I know how to use SPSS and JMP, and have quite a few
 JMP files I would like to use in R.  I converted them
 to .xpt files, downloaded the 'foreign' library then
 tried this command:
 
 read.xport(D:\\Databases\nameoffile.xpt)
 
 to which I get:
 
 Error in lookup.xport(file) : unable to open file
 
 I have read FAQ lists and Google searched and cannot
 figure out what I'm doing wrong that my files won't
 open.  I tried saving to the C drive, but no luck
 there.  I also have no luck getting R to read my SPSS
 files with read.spss
 
 My file names do have spaces and dashes, and I do have
 variables/variable names longer than 8 characters.
 
 Please help!  I am very new to R and do not understand
 all the package reference manuals...I can not seem to
 find a simple, basic guide to how to command R and use
 basic functions without a bunch of jargon (eg 'usage'
 and 'arguments').  It would help to at least be able
 to load my files to practice on.
 
 Any help would be appreciated!
 Thanks,
 Diana
 
 

 __
 __
 Pinpoint customers who are looking for what you sell.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rfImpute

2007-08-13 Thread Liaw, Andy
I seem to recall that rfImpute() can sometimes come up with NAs at some
point in the iterations.  Could you please send me a (small) set of
data/code that reproduces the problem?

Andy

From: Eric Turkheimer
 
 I am having trouble with the rfImpute function in the 
 randomForest package.
 Here is a sample...
 
 clunk.roughfix-na.roughfix(clunk)
 
  clunk.impute-rfImpute(CONVERT~.,data=clunk)
 ntree  OOB  1  2
   300:  26.80%  3.83% 85.37%
 ntree  OOB  1  2
   300:  18.56%  5.74% 51.22%
 Error in randomForest.default(xf, y, ntree = ntree, ..., 
 do.trace = ntree,
 :
 NA not permitted in predictors
 
 So roughFix works, but rfImpute doesn't
 
 Thanks,
 Eric
  ent3c *at* virginia.edu
 
 -- 
 Eric Turkheimer, PhD
 Department of Psychology
 University of Virginia
 PO Box 400400
 Charlottesville, VA  22904-4400
 
 434-982-4732
 434-982-4766 (FAX)
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sourcing commands but delaying their execution

2007-08-03 Thread Liaw, Andy
Here's one possibility:

The file garbage.R has

  x - rnorm(100)
  print(summary(x))

You can do:

  cmds - parse(file=garbage.R, n=NA)

and when you want to execute those commands, do

  eval(cmds)

Andy 

From: Dennis Fisher
 
 Colleagues:
 
 I have encountered the following situation:
   SERIES OF COMMANDS
   source(File1)
   MORE COMMANDS
   source(File2)
 
 Optimally, I would like File1 and File2 to be merged into a single  
 file (FileMerged).  However, if I wrote the following:
   SERIES OF COMMANDS
   source(FileMerged)
   MORE COMMANDS
 
 I encounter an error: the File2 portion of FileMerged contains  
 commands that cannot be executed properly until MORE COMMANDS are  
 executed.  Similarly, sourcing FileMerged after MORE COMMANDS does  
 not work because MORE COMMANDS requires the information from the  
 File1 portion of FileMerged.
 
 I am looking for a means to source FileMerged but not execute 
 some of  
 the commands immediately.  Functionally this would look like:
   SERIES OF COMMANDS
   source(FileMerged)# but withhold execution of 
 some of the commands
   MORE COMMANDS
   COMMAND TO EXECUTE THE WITHHELD COMMANDS
 
 Does R offer some option to accomplish this?
 
 Dennis
 
 Dennis Fisher MD
 P  (The P Less Than Company)
 Phone: 1-866-PLessThan (1-866-753-7784)
 Fax: 1-415-564-2220
 www.PLessThan.com
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting the name of variables passed to a function

2007-07-30 Thread Liaw, Andy
Here's one possibility:

R f - function(...) { call - match.call(); sapply(as.list(call[-1]),
deparse) }
R f(x, y)
[1] x y
R f(x=x, y=y)
  x   y 
x y 

You basically need to know how to manipulate call objects.  The relevant
section in the R Language Definition should help.

Andy

 
From: Horace Tso
 
 Folks,
 
 I've entered into an R programming territory I'm not very 
 familiar with, thus this probably very elementary question 
 concerning the mechanic of a function call.
 
 I want to know from within a function the name of the 
 variables I pass down. The function makes use of the ... to 
 allow for multiple unknown arguments,
 
 myfun = function(...) { do something }
 
 In the body I put,
 
 {
 nm - names(list(...))
 nm
 }
 
 When the function is called with two vectors x, and y
 
 myfun(x, y)
 
 It returns NULL. However, when the call made is,
 
 myfun(x=x, y=y)
 
 The result is
 [1] x y
 
 Question : how do i get the names of the unknown variables 
 without explicitly saying x=x...
 
 Thanks in advance.
 
 Horace
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] randomForest importance problem with combine [Broadcast]

2007-07-24 Thread Liaw, Andy
I've been fixing some problems in the combine() function, but that's
only for regression data.  Looks like you are doing classification, and
I don't see the problem:

R library(randomForest)
randomForest 4.5-19 
Type rfNews() to see new features/changes/bug fixes.
R set.seed(1)
R rflist - replicate(50, randomForest(iris[-5], iris[[5]], ntree=50,
importance=TRUE), simplify=FALSE)
R rfall - do.call(combine, rflist)
R importance(rfall)
setosa versicolor virginica MeanDecreaseAccuracy
Sepal.Length 0.4457861 0.53883425 0.55806570.4120840
Sepal.Width  0.3266790 0.07652383 0.36202400.2128450
Petal.Length 1.1950989 1.42014628 1.32204710.7989841
Petal.Width  1.1986973 1.40855969 1.36406200.7951053
 MeanDecreaseGini
Sepal.Length 9.578580
Sepal.Width  2.301172
Petal.Length42.935832
Petal.Width 44.409058
R importance(rflist[[1]])
   setosa  versicolor virginica MeanDecreaseAccuracy
Sepal.Length 0.401714  0.71583422 0.49464200.4166555
Sepal.Width  0.00 -0.03155946 0.68292870.2317111
Petal.Length 1.290430  1.47915219 1.34567700.8219003
Petal.Width  1.110142  1.44996777 1.35847990.7881210
 MeanDecreaseGini
Sepal.Length 6.168439
Sepal.Width  2.240723
Petal.Length48.821726
Petal.Width 42.059112

Please provide a reproducible example.

Andy
 

From: Joseph Retzer
 
 My apologies, subject corrected.
 
 
 I'm building a RF 50 trees at a time due to memory limitations (I have
  roughly .5 million observations and around 20 variables). I thought I
  could combine some or all of my forests later and look at global
  importance. 
 
 If I have say 2 forests : tree1 and tree2, they have similar Gini and
  Raw importances and, additionally, are similar to one another. After
  combining (using the combine command) the trees into one however, the
  combined tree Raw importances have changed in rank order 
 rather dramtically
  (e.g. the top most important becomes least important. It is not
  however a completely reversed ordering). In addtion, the 
 scale of both the
  Raw and Gini importances is orders of magnitude smaller for 
 the combined
  tree.
 
 Note that the combined tree Gini importance looks roughly similar to
  the individual tree Gini (and Raw) importance, at least in 
 terms of rank
  ordering.
 
 I'm using the non-formula randomForest specification  along  with
   norm.votes=FALSE to facilitate  large sample  estimation  and  tree
  combining.
 
 I'm using R 2.5.0 on a windows XP machine with 2 gig RAM. I'm also
  using randomForest 4.5-18.
 
 Any advice is appreciated,
 Many thanks,
 Joe
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Viewing a data object

2007-06-13 Thread Liaw, Andy
I believe JGR has an object browser.  See the screenshots at the bottom
of http://rosuda.org/JGR/.

Andy 

From: Stephen Tucker
 
 Hi Horace,
 
 I have also thought that it may be useful but I don't know of 
 any Object
 Explorer available for R.
 
 However, (you may alread know this but) 
 (1) you can view your list of objects in R with objects(), 
 (2) view objects in a spreadsheet-like table (if they are 
 matrices or data
 frames) with invisible(edit(objectName)) [which isn't easy on 
 the fingers].
 fix(objectName) is also a shorter option but it has the side effect of
 possibly changing your object when you close the viewing 
 data. For instance,
 this can happen if you mistakenly type something into a cell; 
 it can also
 change your column classes when you don't - for example:
 
  options(stringsAsFactors=TRUE)
  x - data.frame(letters[1:5],1:5)
  sapply(x,class)
 letters.1.5. X1.5 
 factorinteger 
  fix(x) # no user-changes made
  sapply(x,class)
 letters.1.5. X1.5 
 factornumeric 
 
 (3) I believe Deepayan Sarkar contributed the tab-completion 
 capability at
 the command line. So unless you have a lot of objects beginning with
 'AuroraStoch...' you should be able to type a few letters and let the
 auto-completion handle the rest.
 
 Best regards,
 
 ST
 
 
 --- Horace Tso [EMAIL PROTECTED] wrote:
 
  Dear list,
  
  First apologize that this is trivial and just betrays my 
 slothfulness at
  the keyboard. I'm sick of having to type a long name just 
 to get a glimpse
  of something. For example, if my data frame is named
  'AuroraStochasticRunsJune1.df and I want to see what the 
 middle looks
  like, I have to type
  
  AuroraStochasticRunsJune1.df[ 400:500, ]
  
  And often I'm not even sure rows 400 to 500 are what I want 
 to see.  I
  might have to type the same line many times.
  
  Is there sort of a R-equivalence of the Object Explorer, 
 like in Splus,
  where I could mouse-click an object in a list and a window 
 pops up?  Short
  of that, is there any trick of saving a couple of 
 keystrokes here and
  there?
  
  Thanks for tolerating this kind of annoying questions.
  
  H.
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
 
 
 
  
 __
 __
 Sucker-punch spam with award-winning protection.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R for bioinformatics

2007-06-04 Thread Liaw, Andy
Just to complete this thread:  A colleague sent me the following
regarding the book.

Following up on this post from a few months back...
The author has recently posted a public-domain version of this
book on CRAN under Documentation - Contributed -
Statistics Using R with Biological Examples by Kim Seefeld and Ernst
Linder (PDF). 

Unfortunately not all the mirror sites have it yet.

Andy 

From: Benoit Ballester
 
 Marc Schwartz wrote:
  On Thu, 2007-02-01 at 21:32 +0100, Peter Dalgaard wrote:
  Marc Schwartz wrote:
  On Thu, 2007-02-01 at 10:45 -0800, Seth Falcon wrote:

  Benoit Ballester [EMAIL PROTECTED] writes:
 
  
  Hi,
 
  I was wondering if someone could tell me more about 
 this book, (if it's 
  a good or bad one).
  I can't find it, as it seems that O'Reilly doesn't 
 publish any more.

  I've never seen a copy so I can't comment about its quality (has
  anyone seen a copy?).
 
  You might want to take a look at _Bioinformatics and 
 Computational
  Biology Solutions Using R and Bioconductor_.
 
  http://www.bioconductor.org/pub/docs/mogr/
  
  I'll stand (or sit) to be corrected on this as I cannot 
 find the source,
  but I have a recollection from seeing something quite 
 some time ago that
  the book may have never been published.

  It's been a while since the status was something along the 
 lines that 
  the authors may or may not complete it. Subject matter 
 moving faster 
  than pen, I suspect
  
  Peter, that wording does seem familiar, just cannot recall 
 where I saw
  it. Perhaps on the O'Reilly web site, where it is no longer listed.
  
  For confirmation, I called O'Reilly's customer service in 
 Cambridge, MA.
  They confirm that the book was indeed cancelled and never published.
  
  No reasons were given.
 
 Thanks for those replies.
 I did also contacted the O'reilly offices in UK, and they told me the 
 same thing.  The book was never published. I just wanted to 
 compare the 
 R for bioinformatics with the Bioinformatics and Computational 
 Biology Solutions Using R and Bioconductor, and see which 
 one suit me 
 more - But guess I don't have the choice now :-)
 
 Ben
 
 -- 
 Benoit Ballester
 Ensembl Team
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] normality tests [Broadcast]

2007-05-25 Thread Liaw, Andy
From: [EMAIL PROTECTED]
 
 On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
  [EMAIL PROTECTED] wrote:
   Hi all,
  
   apologies for seeking advice on a general stats question. I ve run
   normality tests using 8 different methods:
   - Lilliefors
   - Shapiro-Wilk
   - Robust Jarque Bera
   - Jarque Bera
   - Anderson-Darling
   - Pearson chi-square
   - Cramer-von Mises
   - Shapiro-Francia
  
   All show that the null hypothesis that the data come from a normal
   distro cannot be rejected. Great. However, I don't think 
 it looks nice
   to report the values of 8 different tests on a report. One note is
   that my sample size is really tiny (less than 20 
 independent cases).
   Without wanting to start a flame war, are there any 
 advices of which
   one/ones would be more appropriate and should be reported 
 (along with
   a Q-Q plot). Thank you.
  
   Regards,
  
 
  Wow - I have so many concerns with that approach that it's 
 hard to know
  where to begin.  But first of all, why care about 
 normality?  Why not
  use distribution-free methods?
 
  You should examine the power of the tests for n=20.  You'll probably
  find it's not good enough to reach a reliable conclusion.
 
 And wouldn't it be even worse if I used non-parametric tests?

I believe what Frank meant was that it's probably better to use a
distribution-free procedure to do the real test of interest (if there is
one) instead of testing for normality, and then use a test that assumes
normality.

I guess the question is, what exactly do you want to do with the outcome
of the normality tests?  If those are going to be used as basis for
deciding which test(s) to do next, then I concur with Frank's
reservation.

Generally speaking, I do not find goodness-of-fit for distributions very
useful, mostly for the reason that failure to reject the null is no
evidence in favor of the null.  It's difficult for me to imagine why
there's insufficient evidence to show that the data did not come from a
normal distribution would be interesting.

Andy

 
 
  Frank
 
 
  --
  Frank E Harrell Jr   Professor and Chair   School 
 of Medicine
Department of Biostatistics   
 Vanderbilt University
 
 
 
 -- 
 yianni
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] in unix opening data object created under win

2007-05-24 Thread Liaw, Andy
What are the versions of R on the two platform?  Is the version on Unix
at least as new as the one on Windows?

Andy 

From: [EMAIL PROTECTED]
 
 Hi All
 
 I am saving a dataframe in my MS-Win R with save().
 Then I copy it onto my personal AFS space.
 Then I start R and run it with emacs and load() the data.
 It loads only 2 lines: head() shows only two lines nrow() als 
 say it has only 2 
 lines, I get error message, when trying to use this data 
 object, saying that 
 some row numbers are missing.
 If anyone had similar situation, I appreciate letting me know.
 
 Best Toby
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using lm() with variable formula [Broadcast]

2007-05-18 Thread Liaw, Andy
One way to do it is by giving a data frame with the right variables to
lm() as the first argument each time.  If lm() is given a data frame as
the first argument, it will treat the first variable as the LHS and the
rest as the RHS of the formula.

As examples, you can do:

lm(myData[c(height, weight, BP, Cals)])

(The drawback to this is that the formula in the fitted model object
looks a bit strange...)

Andy


From: Chris Elsaesser
 
 New to R; please excuse me if this is a dumb question.  I 
 tried to RTFM;
 didn't help.
 
 I want to do a series of regressions over the columns in a data.frame,
 systematically varying the response variable and the the 
 terms; and not
 necessarily including all the non-response columns.  In my case, the
 columns are time series. I don't know if that makes a difference; it
 does mean I have to call lag() to offset non-response terms. I can not
 assume a specific number of columns in the data.frame; might 
 be 3, might
 be 20. 
 
 My central problem is that the formula given to lm() is different each
 time.  For example, say a data.frame had columns with the following
 headings:  height, weight, BP (blood pressure), and Cals 
 (calorie intake
 per time frame).  In that case, I'd need something like the following:
 
   lm(height ~ weight + BP + Cals)
   lm(height ~ weight + BP)
   lm(height ~ weight + Cals)
   lm(height ~ BP + Cals)
   lm(weight ~ height + BP)
   lm(weight ~ height + Cals)
   etc.
 
 In general, I'll have to read the header to get the argument labels.
 
 Do I have to write several functions, each taking a different 
 number of
 arguments?  I'd like to construct a string or list representing the
 varialbes in the formula and apply lm(), so to say  [I'm mainly a Lisp
 programmer where that part would be very simple. Anyone have 
 a Lisp API
 for R? :-}]
 
 Thanks,
 chris
 
 Chris Elsaesser, PhD
 Principal Scientist, Machine Learning
 SPADAC Inc.
 7921 Jones Branch Dr. Suite 600  
 McLean, VA 22102  
 
 703.371.7301 (m)
 703.637.9421 (o)
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] more woes trying to convert a data.frame to a numerical matrix

2007-05-16 Thread Liaw, Andy
I think this might be a bit more straight forward:

R mat - do.call(cbind, scan(clipboard, what=list(NULL, 0, 0, 0),
sep=,, skip=2))
Read 3 records
R mat
 [,1] [,2] [,3]
[1,]123
[2,]456
[3,]789

Andy


From: Andrew Yee
 
 Thanks again to everyone for all your help.
 
 I think I've figured out the solution to my dilemma.  Instead of using
 data.matrix or sapply, this works for me:
 
 sample.data-read.csv(sample.csv)
 sample.matrix.raw-as.matrix(sample.data[-1,-1])
 sample.matrix - matrix(as.numeric(sample.matrix.raw),
 nrow=attributes(sample.matrix.raw)$dim[1], ncol=attributes(
 sample.matrix.raw)$dim[2])
 
 With the above code, I get the desired matrix of:
 
 1 2 3
 4 5 6
 7 8 9
 
 (I'd like to be able to import the whole csv and then subset 
 the relevant
 header and data sections (rather than creating a separate csv 
 for the header
 and csv for the data)
 
 Of course, the above code seems kind of clunky, and welcome 
 any suggestions
 for improvement.
 
 Thanks,
 Andrew
 
 
 On 5/16/07, Andrew Yee [EMAIL PROTECTED] wrote:
 
  Thanks for the suggestion.
 
  However, I've tried sapply and data.matrix.
 
  The problem is that it while it returns a numeric matrix, 
 it gives back:
 
  1 1 1
  2 2 2
  3 3 3
 
  instead of
 
  1 2 3
  4 5 6
  7 8 9
 
  The latter matrix is the desired result
 
  Thanks,
  Andrew
 
  On 5/16/07, Marc Schwartz  [EMAIL PROTECTED] wrote:
  
   On Wed, 2007-05-16 at 08:40 -0400, Andrew Yee wrote:
Thanks for the suggestion and the explanation for why I 
 was running
into these troubles.
   
I've tried:
   
as.numeric(as.matrix(sample.data[-1, -1]))
   
However, this creates another vector rather than a matrix.
  
   Right. That's because I'm an idiot and need more caffeine... :-)
  
 Is there a straight forward way to convert this directly into a
numeric matrix rather than a vector?
  
   Yeah, Dimitris' approach below of using data.matrix().
  
   You could also use:
  
   mat - sapply(sample.data[-1, -1], as.numeric)
   rownames(mat) - rownames(sample.data[-1, -1])
  
mat
 x y z
   2 1 1 1
   3 2 2 2
   4 3 3 3
  
   Though, this is essentially what data.matrix() does internally.
  
Additionally, I've also considered:
   
data.matrix(sample.data[-1,-1]
   
but bizarrely, it returns:
   
  x y z
2 1 1 1
3 2 2 2
4 3 3 3
  
   That is a numeric matrix:
  
str(data.matrix(sample.data[-1, -1]))
   int [1:3, 1:3] 1 2 3 1 2 3 1 2 3
   - attr(*, dimnames)=List of 2
 ..$ : chr [1:3] 2 3 4
 ..$ : chr [1:3] x y z
  
   HTH,
  
   Marc
  
   
Thanks,
Andrew
   
   
On 5/16/07, Marc Schwartz  [EMAIL PROTECTED] wrote:
On Wed, 2007-05-16 at 08:10 -0400, Andrew Yee wrote:
 I have the following csv file:

 name,x,y,z
 category,delta,gamma,epsilon
 a,1,2,3
 b,4,5,6
 c,7,8,9

 I'd like to create a numeric matrix of just 
 the numbers in
this csv dataset.

 I've tried the following program:

 sample.data - read.csv(sample.csv)
 numerical.data - as.matrix (sample.data[-1,-1])

 However, print(numerical.data ) returns what 
 appears to be a
matrix of
 characters:

   x   y   z
 2 1 2 3
 3 4 5 6
 4 7 8 9

 How do I force it to be numbers rather than 
 characters?

 Thanks,
 Andrew
   
The problem is that you have two rows which 
 contain alpha
entries.
   
The first row is treated as the header, but the 
 second row is
treated as
actual data, thus overriding the numeric values in the
subsequent rows.
   
You could use:
   
  as.numeric(as.matrix(sample.data [-1, -1]))
   
to coerce the matrix to numeric, or if you 
 don't need the
alpha entries,
you could modify the read.csv() call to something like:
   
  read.csv(sample.csv, header = FALSE, skip = 
 2, row.names =
1,
   col.names = c(name, x, y, z)
   
This will skip the first two rows, set the 
 first column to the
  
row names
and give you a data frame with numeric columns, 
 which in most
cases can
be treated as a numeric matrix and/or you could 
 explicitly
coerce it to
one.
   
HTH,
   
Marc Schwartz
   
   
   
  
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and 

Re: [R] Testing for existence inside a function [Broadcast]

2007-05-15 Thread Liaw, Andy
Not sure which one you want, but the following should cover it:

R f - function(x) c(x=missing(x), y=exists(y))
R f(1)
x y 
FALSE FALSE 
R f()
x y 
 TRUE FALSE 
R y - 1
R f()
   xy 
TRUE TRUE 
R f(1)
x y 
FALSE  TRUE 

Andy 

From: Talbot Katz
 
 Hi.
 
 I'm having trouble testing for existence of an object inside 
 a function.
 
 Suppose I have a function:
 
 f-function(x){
 ...
 }
 
 and I call it with argument y:
 
 f(y)
 
 I'd like to check inside the function whether argument y 
 exists.  Is this 
 possible, or do I have to either check outside the function 
 or pass the name 
 of the argument as a separate argument?
 
 If I do exists(x)  or exists(eval(x)) inside the function and 
 y does not 
 exist, it generates an error message.  If I do exists(x) it 
 says that x 
 exists even if y does not.  If I had a separate argument to 
 hold the text 
 string y then I could check that.  But is it possible to check the 
 existence of the argument inside the function without passing 
 its name as a 
 separate argument?
 
 Thanks!
 
 --  TMK  --
 212-460-5430  home
 917-656-5351  cell
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Testing for existence inside a function

2007-05-15 Thread Liaw, Andy
Just need a bit more work:

R f - function(x) exists(deparse(substitute(x)))
R f(y)
[1] FALSE
R y - 1
R f(y)
[1] TRUE
R f(z)
[1] FALSE

Andy 

From: Talbot Katz
 
 Hi, Andy.
 
 Thank you for the quick response!  Unfortunately, none of 
 these are exactly 
 what I'm looking for.  I'm looking for the following:  
 Suppose object y 
 exists and object z does not exist.  If I pass y as the value of the 
 argument to my function, I want to be able to verify, inside 
 my function, 
 the existence of y; similarly, if I pass z as the value of 
 the argument, I 
 want to be able to see, inside the function, that z doesn't exist.
 
 The missing function just checks whether the argument is 
 missing; in my 
 case, the argument is not missing, but the object may not 
 exist.  And the 
 way you use the exists function inside the user-defined 
 function doesn't 
 test the argument to the user-defined function, it's just 
 hard-coded for the 
 object y.  So I'm sorry if I wasn't clear before, and I hope 
 this is clear 
 now.  Perhaps what I'm attempting to do is unavailable 
 because it's a bad 
 programming paradigm.  But even an explanation if that's the 
 case would be 
 appreciated.
 
 --  TMK  --
 212-460-5430  home
 917-656-5351  cell
 
 
 
 From: Liaw, Andy [EMAIL PROTECTED]
 To: Talbot Katz [EMAIL PROTECTED],r-help@stat.math.ethz.ch
 Subject: RE: [R] Testing for existence inside a function  [Broadcast]
 Date: Tue, 15 May 2007 11:03:12 -0400
 
 Not sure which one you want, but the following should cover it:
 
 R f - function(x) c(x=missing(x), y=exists(y))
 R f(1)
  x y
 FALSE FALSE
 R f()
  x y
   TRUE FALSE
 R y - 1
 R f()
 xy
 TRUE TRUE
 R f(1)
  x y
 FALSE  TRUE
 
 Andy
 
 From: Talbot Katz
  
   Hi.
  
   I'm having trouble testing for existence of an object inside
   a function.
  
   Suppose I have a function:
  
   f-function(x){
   ...
   }
  
   and I call it with argument y:
  
   f(y)
  
   I'd like to check inside the function whether argument y
   exists.  Is this
   possible, or do I have to either check outside the function
   or pass the name
   of the argument as a separate argument?
  
   If I do exists(x)  or exists(eval(x)) inside the function and
   y does not
   exist, it generates an error message.  If I do exists(x) it
   says that x
   exists even if y does not.  If I had a separate argument to
   hold the text
   string y then I could check that.  But is it possible 
 to check the
   existence of the argument inside the function without passing
   its name as a
   separate argument?
  
   Thanks!
  
   --  TMK  --
   212-460-5430  home
   917-656-5351  cell
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
  
 
 
 -
 -
 Notice:  This e-mail message, together with any attachments, contains
 information of Merck  Co., Inc. (One Merck Drive, 
 Whitehouse Station,
 New Jersey, USA 08889), and/or its affiliates (which may be known
 outside the United States as Merck Frosst, Merck Sharp  Dohme or MSD
 and in Japan, as Banyu - direct contact information for affiliates is
 available at http://www.merck.com/contact/contacts.html) that may be
 confidential, proprietary copyrighted and/or legally 
 privileged. It is
 intended solely for the use of the individual or entity named on this
 message. If you are not the intended recipient, and have 
 received this
 message in error, please notify us immediately by reply 
 e-mail and then
 delete it from your system.
 
 -
 -
 
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Testing for existence inside a function

2007-05-15 Thread Liaw, Andy
Another thing to watch out for is that an argument to a function can be
an expression (or even literal constants), instead of just the name of
an object.  exists() wouldn't really do the right thing.  I'm not sure
how to properly do the exhaustive check.

Andy

From: Gabor Grothendieck
 
 Try this modification:
 
  chk - function(x) exists(deparse(substitute(x)), 
 parent.env(environment()))
  ab - 1
  chk(ab)
 [1] TRUE
  exists(x)
 [1] FALSE
  chk(x)
 [1] FALSE
 
 
 
 On 5/15/07, Talbot Katz [EMAIL PROTECTED] wrote:
  Hi.
 
  Thanks once more for the swift response.  This solution 
 works pretty well.
  The only small glitch is if I pass the function an argument 
 with the same
  name as the function argument.  That is, suppose x is the 
 argument name in
  my user-defined function, and that object x does not 
 exist.  If I call the
  function f(x), i.e., using the non-existent object x as the 
 argument value,
  then the function says that x exists.
 
  Here is my example log:
 
  chkex5 - function(objn){
  + c(exob=exists(deparse(substitute(objn
  + }
  exists(objn)
  [1] FALSE
  chkex5(objn)
  exob
  TRUE
  
 
  But I suppose I can live with this.  Thanks again!
 
 
  --  TMK  --
  212-460-5430home
  917-656-5351cell
 
 
 
  From: Liaw, Andy [EMAIL PROTECTED]
  To: Talbot Katz [EMAIL PROTECTED],r-help@stat.math.ethz.ch
  Subject: RE: [R] Testing for existence inside a function
  Date: Tue, 15 May 2007 11:41:17 -0400
  
  Just need a bit more work:
  
  R f - function(x) exists(deparse(substitute(x)))
  R f(y)
  [1] FALSE
  R y - 1
  R f(y)
  [1] TRUE
  R f(z)
  [1] FALSE
  
  Andy
  
  From: Talbot Katz
   
Hi, Andy.
   
Thank you for the quick response!  Unfortunately, none of
these are exactly
what I'm looking for.  I'm looking for the following:
Suppose object y
exists and object z does not exist.  If I pass y as the 
 value of the
argument to my function, I want to be able to verify, inside
my function,
the existence of y; similarly, if I pass z as the value of
the argument, I
want to be able to see, inside the function, that z 
 doesn't exist.
   
The missing function just checks whether the argument is
missing; in my
case, the argument is not missing, but the object may not
exist.  And the
way you use the exists function inside the user-defined
function doesn't
test the argument to the user-defined function, it's just
hard-coded for the
object y.  So I'm sorry if I wasn't clear before, and I hope
this is clear
now.  Perhaps what I'm attempting to do is unavailable
because it's a bad
programming paradigm.  But even an explanation if that's the
case would be
appreciated.
   
--  TMK  --
212-460-5430home
917-656-5351cell
   
   
   
From: Liaw, Andy [EMAIL PROTECTED]
To: Talbot Katz [EMAIL PROTECTED],r-help@stat.math.ethz.ch
Subject: RE: [R] Testing for existence inside a 
 function  [Broadcast]
Date: Tue, 15 May 2007 11:03:12 -0400

Not sure which one you want, but the following should cover it:

R f - function(x) c(x=missing(x), y=exists(y))
R f(1)
 x y
FALSE FALSE
R f()
 x y
  TRUE FALSE
R y - 1
R f()
xy
TRUE TRUE
R f(1)
 x y
FALSE  TRUE

Andy

From: Talbot Katz
 
  Hi.
 
  I'm having trouble testing for existence of an object inside
  a function.
 
  Suppose I have a function:
 
  f-function(x){
  ...
  }
 
  and I call it with argument y:
 
  f(y)
 
  I'd like to check inside the function whether argument y
  exists.  Is this
  possible, or do I have to either check outside the function
  or pass the name
  of the argument as a separate argument?
 
  If I do exists(x)  or exists(eval(x)) inside the 
 function and
  y does not
  exist, it generates an error message.  If I do 
 exists(x) it
  says that x
  exists even if y does not.  If I had a separate argument to
  hold the text
  string y then I could check that.  But is it possible
to check the
  existence of the argument inside the function 
 without passing
  its name as a
  separate argument?
 
  Thanks!
 
  --  TMK  --
  212-460-5430home
  917-656-5351cell
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, 
 reproducible code.
 
 
 


-
-
Notice:  This e-mail message, together with any 
 attachments, contains
information of Merck  Co., Inc. (One Merck

Re: [R] Optimized File Reading with R

2007-05-15 Thread Liaw, Andy
If it's a matrix, use scan().  If the columns are not all the same type,
use the colClasses argument to read.table() to specify their types,
instead of leaving it to R to guess.  That will speed things up quite a
lot.

Andy 

From: Lorenzo Isella
 
 Dear All,
 Hope I am not bumping into a FAQ, but so far my online search 
 has been fruitless
 I need to read some data file using R. I am using the (I think)
 standard command:
 
 data_150-read.table(y_complete06000, header=FALSE)
 
 where y_complete06000 is a 6000 by 40 table of numbers.
 I am puzzled at the fact that R is taking several minutes to 
 read this file.
 First I thought it may have been due to its shape, but even
 re-expressing and saving the matrix as a 1D array does not help.
 It is not a small file, but not even huge (it amounts to about 5Mb of
 text file).
 Is there anything I can do to speed up the file reading?
 Many thanks
 
 Lorenzo
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] geeting name of an object to which a variable refers?

2007-05-11 Thread Liaw, Andy
 Something like this?

R f - function(x) deparse(substitute(x))
R a - 1:3
R f(a)
[1] a

Andy

From: new ruser
 
 #Sorry for the convoluted subject line.
 
 #I have:
 
 a=c(1,2,3)
 x=a #example of user supplied input
 
 
 #Is there any function that will tell me the name of the 
 object x refers to, referring only to x itself? 
 #i.e. the answer I want is a
 
 #I want: 
 #fun(x) == 'a'
 
 #(I don't think this is possible, but figured I'd ask.)
 
 
 

 -
 Got a little couch potato? 
 Check out fun summer activities for kids.
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Allocating shelf space

2007-05-09 Thread Liaw, Andy
I don't know if there's an R solution, but this sounds to me like some
variation of the knapsack problem...

 http://en.wikipedia.org/wiki/Knapsack_problem

Andy

From: [EMAIL PROTECTED]
 
 Hi Folks,
 
 This is not an R question as such, though it may well have
 an R answer. (And, in any case, this community probably
 knows more about most things than most others ... indeed,
 has probably pondered this very question).
 
 I: Given a catalogue of hundreds of books, where each
 entry has author and title (or equivalent ID), and also
 
 Ia) The dimensions (thickness, height, depth) of the book
 Ib) A sort of classification of its subject/type/genre
 
 II: Given also a specification of available and possibly
 potential bookshelf space (numbers of book-cases, the width,
 height and shelf-spacing of each, and the dimensions of any
 free wall-space where further book-cases may be placed),
 where some book-cases have fixed shelves and some have shelves
 with (discretely) adjustable position, and additional book-cases
 can be designed to measure (probably with adjustable shelves).
 
 Question: Is there a resource to approach the solution of the
 problem of optimising the placement of adjustable shelves,
 the design of additional bookcases, and the placement of the
 books in the resulting shelf-space so as to
 
 A: Make the efficient use of space
 B: Minimise the spatial disclocation of related books
(it is acceptable to separate large books from small books
on the same subject, for the sake of efficient packing).
 
 Awaiting comments and suggestions with interest!
 With thanks,
 Ted.
 
 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861
 Date: 09-May-07   Time: 18:23:53
 -- XFMail --
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] summing values according to a factor

2007-05-07 Thread Liaw, Andy
Howdy!

I guess what you want to do is compare Q1/T1 among the sections?  If you
want to compute the sum of Q1/T1 by Section, you can do something like:

sum.by.section - with(mydata, tapply(Q1/T1, section, sum))

Substitute sum with anything you want to compute.

Cheers,
Andy

From: Salvatore Enrico Indiogine
 
 Greetings!
 
 I have exam scores of students in several sections.  The data 
 looks like:
 
 StuNum Section Q1  T1
 111   502 45   123
 112   502 23123
 113   503 58123
 114   504  63   123
 115   504  83   123
 ..
 
 where Q1 is the score for question 1 and T1 is  the maximum possible
 score for question 1
 
 I need to check whether the section has an effect on the scores.  I
 thought about using chisq.test and calculate the sums of scores per
 section.
 
 I think that I have to use apply() but I am lost here.
 
 Thanks in advance,
 Enrico
 
 -- 
 Enrico Indiogine
 
 Mathematics Education
 Texas AM University
 [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R package development in windows

2007-05-04 Thread Liaw, Andy
I guess it depends on what you want to be able to do with such a private
package; e.g., does it not need to have any documentation (i.e., the Rd
files)?  If all you want is to be able to access the objects, you can
just save() all those objects (mostly functions, I presume) in a .rda
file, and whenever you need them. just attach() the .rda file.

Andy

From: Lucke, Joseph F
 
 Might there be an (semi-)automated procedure to create a minimal,
 personal package, for my eyes only, that I can load with a
 libray(MyStuff) command?  This would be preferable to having to
 source() the files.  Is there already such a procedure?
 Joe
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Doran, Harold
 Sent: Thursday, May 03, 2007 2:33 PM
 To: Duncan Murdoch
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] [SPAM] - Re: R package development in windows -
 BayesianFilter detected spam
 
 Thanks, Duncan. I'll look into that. Is there an 
 authoritative document
 that codifies the new package development procedures for 2.5.0
 (windows-specific), or is that Writing R Extensions? In this thread
 alone I've received multiple emails pointing to multiple web 
 sites with
 instructions for windows. Inasmuch as its appreciated, I'm a bit
 confused as to which I should consider authoritative.
 
 I do hope I can resolve this and appreciate the help I've received.
 However, I feel a bit compelled to note how very difficult 
 this process
 is. 
 
 Harold
 
 
  -Original Message-
  From: Duncan Murdoch [mailto:[EMAIL PROTECTED]
  Sent: Thursday, May 03, 2007 3:24 PM
  To: Doran, Harold
  Cc: Gabor Grothendieck; r-help@stat.math.ethz.ch
  Subject: [SPAM] - Re: [R] R package development in windows 
 - Bayesian 
  Filter detected spam
  
  On 5/3/2007 3:04 PM, Doran, Harold wrote:
   Thanks Gabor, Sundar, and Tony. Indeed, Rtools was 
 missing from the 
   path. With that resolved, and another 10 minute windows
  restart, I get
   the following below. The log suggests that hhc is not 
 installed. It 
   is, and, according to the directions I am following, I have
  placed it
   in the c:\cygwin directory.
  
  I think the problem is that you are following a real mix of 
  instructions, and they don't make sense.
  
  It would be nice if folks would submit patches to the R 
 Admin manual 
  (or to the Rtools web site) rather than putting together web sites 
  with advice that is bad from day one, and quickly gets 
 worse when it 
  is not updated.
  
   BTW, package.skeleton() doesn't seem to create the correct
  DESCRIPTION
   template. I had to add the DEPENDS line. Without this, I
  get another
   error.
   
   
   C:\Program Files\R\R-2.4.1\binRcmd build --force --binary g:\foo
  
  R 2.4.1 is no longer current; the package building 
 instructions in R 
  2.5.0 have been simplified a bit.  You might want to try those.
  
  Duncan Murdoch
  
   * checking for file 'g:\foo/DESCRIPTION' ... OK
   * preparing 'g:\foo':
   * checking DESCRIPTION meta-information ... OK
   * removing junk files
   * checking for LF line-endings in source files
   * checking for empty or unneeded directories
   * building binary distribution
WARNING
   * some HTML links may not be found
   installing R.css in c:/TEMP/Rinst40061099
   
   Using auto-selected zip options ''
   latex: not found
   latex: not found
   latex: not found
   
   -- Making package foo 
   latex: not found
 adding build stamp to DESCRIPTION
   latex: not found
   latex: not found
   latex: not found
 installing R files
   latex: not found
 installing data files
   latex: not found
 installing man source files
 installing indices
   latex: not found
 not zipping data
 installing help
   Warning: \alias{foo} already in foo-package.Rd -- skipping
  the one in
   foo.Rd   Building/Updating help pages for package 'foo'
Formats: text html latex example chm
 foo-package   texthtmllatex   
  example chm
 foo   texthtmllatex   
  example chm
 mydatatexthtmllatex   
  example chm
   hhc: not found
   cp: cannot stat `c:/TEMP/Rbuild40048815/foo/chm/foo.chm': 
  No such file
   or direct ory
   make[1]: *** [chm-foo] Error 1
   make: *** [pkg-foo] Error 2
   *** Installation of foo failed ***
   
   Removing 'c:/TEMP/Rinst40061099/foo'
ERROR
   * installation failed
   
   
   C:\Program Files\R\R-2.4.1\bin
   
   -Original Message-
   From: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
   Sent: Thursday, May 03, 2007 2:50 PM
   To: Doran, Harold
   Cc: r-help@stat.math.ethz.ch
   Subject: Re: [R] R package development in windows
   
   It can find sh.exe so you haven't installed Rtools.
   
   There are several HowTo's listed in the links section here that 
   include pointers to R manuals and other step by step
   instructions:
   
   

Re: [R] R question [Broadcast]

2007-05-04 Thread Liaw, Andy
Bill,

A couple more points:

1. Please use an informative subject line.  I'd deleted the original 
   post w/o reading if I didn't catch Marc's reply.

2. Are you sure you have bivariate response?  To me bivariate means
   two variables, and randomForest surely does not handle that (at least
   for now).

Andy   

From: Marc Schwartz
 
 On Fri, 2007-05-04 at 12:05 -0500, Bill Vorias wrote:
  I had a question about Random Forests.  I have a text file with 10
  dichotomous variables and a bivariate response vector.  I 
 read this file
  into R as a data frame, and then used the command 
 randomForest(Response ~.,
  dataset, etc.. where Response is the column header of 
 the response
  variable and dataset is the name of the data frame.  I 
 get an error that
  says Response not found.  I was looking at the Iris data 
 example in the R
  help files, and it seems like this is exactly what they 
 did.  Do you have
  any suggestions? Thanks.
 
 
 R you sure that you have correctly specified the column and data frame
 names in the call to randomForest()?
 
 Be sure to check for typos, including capitalization.
 
 You can use:
 
   ls()
 
 to check for the current objects in your working environment 
 and you can
 then use:
 
   str(YourDataFrame)
 
 or 
 
   names(YourDataFrame)
 
 to display information about the detailed structure and/or 
 column names,
 respectively, in the data frame that you created from the 
 imported data.
 
 HTH,
 
 Marc Schwartz
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to concatinate the elements of some text vectors cat() or print() ?

2007-05-02 Thread Liaw, Andy
Is paste() what you're looking for?

Andy 

From: John Kane
 
 I have some comment text taken from a SAS data file. 
 It is stored in two vectors and is difficult to read.
 I would like to simply concatentate the individual
 entries and end up with a character vector that give
 me one line of text per comment.
 
 I cannot see how to do this, yet it must be very easy.
  I have played around with cat() and print with no
 success.  Would someone kindly point out where I
 am going wrong?
 
 Thanks
 
 Simple Example:
 
  aa - LETTERS[1:5]
  bb - letters[1:5]
   cat(aa[1], bb[1])# works for individuals 
  cat(aa,bb)#(concatinates entire vectors)
  
  
 # Using sink I might get it to work if I could figure
 out how to escape a
 # new line command. encodeString does not seem
 appropriate here.
  harry - c(rep(NA,5))
  for (i in 1:5 ) {
  cat (aa[i],bb[i])
  }
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] thousand separator (was RE: weight)

2007-04-30 Thread Liaw, Andy
I've run into this occasionally.  My current solution is simply to read
it into Excel, re-format the offending column(s) by unchecking the
thousand separator box, and write it back out.  Not exactly ideal to
say the least.  If anyone can provide a better solution in R, I'm all
ears...

Andy 

From: Natalie O'Toole
 
 Hi,
 
 These are the variables in my file. I think the variable i'm having 
 problems with is WTPP which is of the Factor type. Does 
 anyone know how to 
 fix this, please?
 
 Thanks,
 
 Nat
 
 data.frame':   290 obs. of  5 variables:
  $ PROV  : num  48 48 48 48 48 48 48 48 48 48 ...
  $ REGION: num  4 4 4 4 4 4 4 4 4 4 ...
  $ GRADE : num  7 7 7 7 7 7 7 7 7 7 ...
  $ Y_Q10A: num  1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
  $ WTPP  : Factor w/ 1884 levels 
 1,106.8250,1,336.5138,..: 1544 67 
 1568 40 221 1702 1702 1434 310 310 ...
 
 
 __
 
 
 
 --- Douglas Bates [EMAIL PROTECTED] wrote:
 
  On 4/28/07, John Kane [EMAIL PROTECTED] wrote:
   IIRC you have a yes/no smoking variable scored 1/2
  ?
  
   It is possibly being read in as a factor not as an
   integer.
  
   try
class(df$smoking.variable)
   to see .
  
  Good point.  In general I would recommend using
  
  str(df)
  
  to check on the class or storage type of all
  variables in a data frame
  if you are getting unexpected results when
  manipulating it.  That
  function is carefully written to provide a maximum
  of information in a
  minimum of space.
 
 Yes but I'm an relative newbie at R and didn't realise
 that str() would do that.  I always thought it was
 some kind of string function. 
 
 Thanks, it makes life much easier.
 
  
   --- Natalie O'Toole [EMAIL PROTECTED] wrote:
  
Hi,
   
I'm getting an error message:
   
Error in df[, 1:4] * df[, 5] : non-numeric
  argument
to binary operator
In addition: Warning message:
Incompatible methods (Ops.data.frame,
Ops.factor) for *
   
here is my code:
   
   
##reading in the file
happyguys-read.table(c:/test4.dat,
  header=TRUE,
row.names=1)
   
##subset the file based on Select If
   
test-subset(happyguys, PROV==48  GRADE == 7  
Y_Q10A  9)
   
##sorting the file
   
mydata-test
mydataSorted-mydata[ order(mydata$Y_Q10A), ]
print(mydataSorted)
   
   
##assigning  a different name to file
   
happyguys-mydataSorted
   
   
##trying to weight my data
   
data.frame-happyguys
df-data.frame
df1-df[, 1:4] * df[, 5]
   
##getting error message here??
   
Error in df[, 1:4] * df[, 5] : non-numeric
  argument
to binary operator
In addition: Warning message:
Incompatible methods (Ops.data.frame,
Ops.factor) for *
   
Does anyone know what this error message means?
   
I've been reviewing R code all day  getting
  more
familiar with it
   
Thanks,
   
Nat
   
  
   
  
 
 --
 --
   
   
This communication is intended for the use of
  the
recipient to which it is
addressed, and may
contain confidential, personal, and or
  privileged
information. Please
contact the sender
immediately if you are not the intended
  recipient of
this communication,
and do not copy,
distribute, or take action relying on it. Any
communication received in
error, or subsequent
reply, should be deleted or destroyed.
  [[alternative HTML version deleted]]
   
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.
   
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained,
  reproducible code.
  
  
 
 
 
   Be smarter than spam. See how smart SpamGuard is at giving junk 
 email the boot with the All-new Yahoo! Mail at 
 http://mrd.mail.yahoo.com/try_beta?.intl=ca
 
 
 --
 -- 
 
 This communication is intended for the use of the recipient 
 to which it is 
 addressed, and may
 contain confidential, personal, and or privileged information. Please 
 contact the sender
 immediately if you are not the intended recipient of this 
 communication, 
 and do not copy,
 distribute, or take action relying on it. Any communication 
 received in 
 error, or subsequent
 reply, should be deleted or destroyed.
   [[alternative HTML version deleted]]
 
 __
 

Re: [R] thousand separator (was RE: weight)

2007-04-30 Thread Liaw, Andy
Still, though, it would be nice to have the data read in correctly in
the first place, instead of having to do this kind of post-processing
afterwards...

Andy 

From: Bert Gunter
 
 Nothing! My mistake! gsub -- not sub -- is what you want to 
 get 'em all.
 
 -- Bert 
 
 
 Bert Gunter
 Genentech Nonclinical Statistics
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz
 Sent: Monday, April 30, 2007 10:18 AM
 To: Bert Gunter
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] thousand separator (was RE: weight)
 
 Bert,
 
 What am I missing?
 
  print(as.numeric(gsub(,, , 1,123,456.789)), 10)
 [1] 1123456.789
 
 
 FWIW, this is using:
 
 R version 2.5.0 Patched (2007-04-27 r41355)
 
 Marc
 
 On Mon, 2007-04-30 at 10:13 -0700, Bert Gunter wrote:
  Except this doesn't work for 1,123,456.789 Marc.
  
  I hesitate to suggest it, but gregexpr() will do it, as it 
 captures the
  position of **every** match to ,. This could be then used 
 to process the
  vector via some sort of loop/apply statement.
  
  But I think there **must** be a more elegant way using 
 regular expressions
  alone, so I, too, await a clever reply.
  
  -- Bert 
  
  
  Bert Gunter
  Genentech Nonclinical Statistics
  
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz
  Sent: Monday, April 30, 2007 10:02 AM
  To: Liaw, Andy
  Cc: r-help@stat.math.ethz.ch
  Subject: Re: [R] thousand separator (was RE: weight)
  
  One possibility would be to use something like the following
  post-import:
  
   WTPP
  [1] 1,106.8250 1,336.5138
  
   str(WTPP)
   Factor w/ 2 levels 1,106.8250,1,336.5138: 1 2
  
   as.numeric(gsub(,, , WTPP))
  [1] 1106.825 1336.514
  
  
  Essentially strip the ',' characters from the factors and 
 then coerce
  the resultant character vector to numeric. 
  
  HTH,
  
  Marc Schwartz
  
  
  On Mon, 2007-04-30 at 12:26 -0400, Liaw, Andy wrote:
   I've run into this occasionally.  My current solution is 
 simply to read
   it into Excel, re-format the offending column(s) by unchecking the
   thousand separator box, and write it back out.  Not 
 exactly ideal to
   say the least.  If anyone can provide a better solution 
 in R, I'm all
   ears...
   
   Andy 
   
   From: Natalie O'Toole

Hi,

These are the variables in my file. I think the 
 variable i'm having 
problems with is WTPP which is of the Factor type. Does 
anyone know how to 
fix this, please?

Thanks,

Nat

data.frame':   290 obs. of  5 variables:
 $ PROV  : num  48 48 48 48 48 48 48 48 48 48 ...
 $ REGION: num  4 4 4 4 4 4 4 4 4 4 ...
 $ GRADE : num  7 7 7 7 7 7 7 7 7 7 ...
 $ Y_Q10A: num  1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
 $ WTPP  : Factor w/ 1884 levels 
1,106.8250,1,336.5138,..: 1544 67 
1568 40 221 1702 1702 1434 310 310 ...


__



--- Douglas Bates [EMAIL PROTECTED] wrote:

 On 4/28/07, John Kane [EMAIL PROTECTED] wrote:
  IIRC you have a yes/no smoking variable scored 1/2
 ?
 
  It is possibly being read in as a factor not as an
  integer.
 
  try
   class(df$smoking.variable)
  to see .
 
 Good point.  In general I would recommend using
 
 str(df)
 
 to check on the class or storage type of all
 variables in a data frame
 if you are getting unexpected results when
 manipulating it.  That
 function is carefully written to provide a maximum
 of information in a
 minimum of space.

Yes but I'm an relative newbie at R and didn't realise
that str() would do that.  I always thought it was
some kind of string function. 

Thanks, it makes life much easier.

 
  --- Natalie O'Toole [EMAIL PROTECTED] wrote:
 
   Hi,
  
   I'm getting an error message:
  
   Error in df[, 1:4] * df[, 5] : non-numeric
 argument
   to binary operator
   In addition: Warning message:
   Incompatible methods (Ops.data.frame,
   Ops.factor) for *
  
   here is my code:
  
  
   ##reading in the file
   happyguys-read.table(c:/test4.dat,
 header=TRUE,
   row.names=1)
  
   ##subset the file based on Select If
  
   test-subset(happyguys, PROV==48  GRADE == 7  
   Y_Q10A  9)
  
   ##sorting the file
  
   mydata-test
   mydataSorted-mydata[ order(mydata$Y_Q10A), ]
   print(mydataSorted)
  
  
   ##assigning  a different name to file
  
   happyguys-mydataSorted
  
  
   ##trying to weight my data
  
   data.frame-happyguys
   df-data.frame
   df1-df[, 1:4] * df[, 5]
  
   ##getting error message here??
  
   Error in df[, 1:4] * df[, 5] : non-numeric
 argument
   to binary operator
   In addition: Warning message

Re: [R] thousand separator (was RE: weight)

2007-04-30 Thread Liaw, Andy
Looks very neat, Gabor!  

I just cannot fathom why anyone who want to write numerics with those
separators in a flat file.   That's usually not for human consumption,
and computers don't need those separators!  

Andy

From: Gabor Grothendieck
 
 That could be accomplished using a custom class like this:
 
 library(methods)
 setClass(num.with.junk)
 setAs(character, num.with.junk,
function(from) as.numeric(gsub(,, , from)))
 
 
 ### test ###
 
 Input - A B
 1,000 1
 2,000 2
 3,000 3
 
 DF - read.table(textConnection(Input), header = TRUE,
colClasses = c(num.with.junk, numeric))
 str(DF)
 
 
 
 On 4/30/07, Liaw, Andy [EMAIL PROTECTED] wrote:
  Still, though, it would be nice to have the data read in 
 correctly in
  the first place, instead of having to do this kind of 
 post-processing
  afterwards...
 
  Andy
 
  From: Bert Gunter
  
   Nothing! My mistake! gsub -- not sub -- is what you want to
   get 'em all.
  
   -- Bert
  
  
   Bert Gunter
   Genentech Nonclinical Statistics
  
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf Of 
 Marc Schwartz
   Sent: Monday, April 30, 2007 10:18 AM
   To: Bert Gunter
   Cc: r-help@stat.math.ethz.ch
   Subject: Re: [R] thousand separator (was RE: weight)
  
   Bert,
  
   What am I missing?
  
print(as.numeric(gsub(,, , 1,123,456.789)), 10)
   [1] 1123456.789
  
  
   FWIW, this is using:
  
   R version 2.5.0 Patched (2007-04-27 r41355)
  
   Marc
  
   On Mon, 2007-04-30 at 10:13 -0700, Bert Gunter wrote:
Except this doesn't work for 1,123,456.789 Marc.
   
I hesitate to suggest it, but gregexpr() will do it, as it
   captures the
position of **every** match to ,. This could be then used
   to process the
vector via some sort of loop/apply statement.
   
But I think there **must** be a more elegant way using
   regular expressions
alone, so I, too, await a clever reply.
   
-- Bert
   
   
Bert Gunter
Genentech Nonclinical Statistics
   
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of 
 Marc Schwartz
Sent: Monday, April 30, 2007 10:02 AM
To: Liaw, Andy
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] thousand separator (was RE: weight)
   
One possibility would be to use something like the following
post-import:
   
 WTPP
[1] 1,106.8250 1,336.5138
   
 str(WTPP)
 Factor w/ 2 levels 1,106.8250,1,336.5138: 1 2
   
 as.numeric(gsub(,, , WTPP))
[1] 1106.825 1336.514
   
   
Essentially strip the ',' characters from the factors and
   then coerce
the resultant character vector to numeric.
   
HTH,
   
Marc Schwartz
   
   
On Mon, 2007-04-30 at 12:26 -0400, Liaw, Andy wrote:
 I've run into this occasionally.  My current solution is
   simply to read
 it into Excel, re-format the offending column(s) by 
 unchecking the
 thousand separator box, and write it back out.  Not
   exactly ideal to
 say the least.  If anyone can provide a better solution
   in R, I'm all
 ears...

 Andy

 From: Natalie O'Toole
 
  Hi,
 
  These are the variables in my file. I think the
   variable i'm having
  problems with is WTPP which is of the Factor type. Does
  anyone know how to
  fix this, please?
 
  Thanks,
 
  Nat
 
  data.frame':   290 obs. of  5 variables:
   $ PROV  : num  48 48 48 48 48 48 48 48 48 48 ...
   $ REGION: num  4 4 4 4 4 4 4 4 4 4 ...
   $ GRADE : num  7 7 7 7 7 7 7 7 7 7 ...
   $ Y_Q10A: num  1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
   $ WTPP  : Factor w/ 1884 levels
  1,106.8250,1,336.5138,..: 1544 67
  1568 40 221 1702 1702 1434 310 310 ...
 
 
  __
 
 
 
  --- Douglas Bates [EMAIL PROTECTED] wrote:
 
   On 4/28/07, John Kane [EMAIL PROTECTED] wrote:
IIRC you have a yes/no smoking variable scored 1/2
   ?
   
It is possibly being read in as a factor not as an
integer.
   
try
 class(df$smoking.variable)
to see .
  
   Good point.  In general I would recommend using
  
   str(df)
  
   to check on the class or storage type of all
   variables in a data frame
   if you are getting unexpected results when
   manipulating it.  That
   function is carefully written to provide a maximum
   of information in a
   minimum of space.
 
  Yes but I'm an relative newbie at R and didn't realise
  that str() would do that.  I always thought it was
  some kind of string function.
 
  Thanks, it makes life much easier.
 
   
--- Natalie O'Toole [EMAIL PROTECTED] wrote:
   
 Hi,

 I'm getting an error message:

 Error in df[, 1:4] * df[, 5] : non-numeric
   argument
 to binary operator

Re: [R] NA and NaN randomForest

2007-04-25 Thread Liaw, Andy
Hi Clayton,

If you use the formula interface, then it should do what you want:

R library(randomForest)
randomForest 4.5-18 
Type rfNews() to see new features/changes/bug fixes.
R iris1 - iris[-(1:5),]
R iris2 - iris[1:5,]
R iris2[1, 3] - NA
R iris2[3, 1] - NA
R iris.rf - randomForest(Species ~ ., iris1)
R predict(iris.rf, iris2[-5])
[1] NA   setosa NA   setosa setosa
Levels: setosa versicolor virginica

The problem, of course, is that the formula interface is not suitable
for data with large number of variables.  I'll look into doing the same
thing in the default method.

Andy


From: [EMAIL PROTECTED]
 
 Dear R-help,
 
 This is about randomForest's handling of NA and NaNs in test set data.
 Currently, if the test set data contains an NA or NaN then 
 predict.randomForest will skip that row in the output.
 
 I would like to change that behavior to outputting an NA.
 
 Can this be done with flags to randomForest?
 If not can some sort of wrapper be built to put the NAs back in?
 
 thanks,
 
 Clayton
 _
 
 CONFIDENTIALITY NOTICE\ \ The information contained in this 
 ...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting Confused [Broadcast]

2007-04-25 Thread Liaw, Andy
If you are serious in getting useful help, please do try to follow
suggestions in the Posting Guide.  You have not told us anything about
your OS, the versions of R you tried to install, and exactly what you
typed to build/install them.

Many Linux distro by default do not install the Fortran part of GCC, so
don't be surprised if that's the case for you (if you are trying to do
this on some version of Linux).

Andy

From: Steiner, Julien
 
 Hello,
 
  
 
 I'm getting confused with my experience of R installing.
 
  
 
 I had R installed on January without any trouble. (I just had 
 to install
 gcc4.1.1) 
 
  
 
 Now I'd like to install a packages which requires tcl/tk. So 
 basically I
 need to reconfigure and re install R right after having installed
 tcl/tk.
 
  
 
 So I installed tcl/tk I run the process to install R but I 
 receive this
 error : 
 
  
 
 checking for dummy main to link with Fortran libraries...
 
 none
 
 checking for Fortran name-mangling scheme... configure:
 
 error: cannot compile a simple Fortran program See 
 `config.log' for more
 details.
 
  
 
  
 
 I checked in the config.log and the fact is that there's no fortran
 compiler installed. But don't gcc already have a fortran compiler in
 it? 
 
  
 
  
 
 If somebody could help I would be thankful and especially if somebody
 has a clue why it worked without any error before and now yes.
 
  
 
  
 
  
 
 Thanks a lot
 
  
 
  
 
  
 
 Julien Steiner
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to convert the lower triangle of a matrix to a symmetricmatrix [Broadcast]

2007-04-20 Thread Liaw, Andy
Ranjan and Prof. Fox,

Similar approach can be found in stats:::as.matrix.dist().

Andy 

From: John Fox
 
 Dear Ranjan,
 
 If the elements are ordered by rows, then the following 
 should do the trick:
 
 X - diag(p)
 X[upper.tri(X, diag=TRUE)] - elements
 X - X + t(X) - diag(diag(X))
 
 If they are ordered by columns, substitute lower.tri() for 
 upper.tri().
 
 I hope this helps,
  John
 
 
 John Fox, Professor
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox 
  
 
  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Ranjan Maitra
  Sent: Thursday, April 19, 2007 9:28 PM
  To: r-help@stat.math.ethz.ch
  Subject: [R] how to convert the lower triangle of a matrix to 
  a symmetricmatrix
  
  Hi,
  
  I have a vector of p*(p+1)/2 elements, essentially the lower 
  triangle of a symmetric matrix. I was wondering if there is 
  an easy way to make it fill a symmetric matrix. I have to do 
  it several times, hence some efficient approach would be 
 very useful.
  
  Many thanks and best wishes,
  Ranjan
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Suggestions for statistical computing course

2007-04-20 Thread Liaw, Andy
I really like John Monahan's Numerical Methods of Statistics (Cambridge
University Press).  

As to running/editing R scripts, you may want to look into JGR.  The
built-in editor is not as smart as ESS in some respect, but smarter
than ESS in others.  The only thing that keep me from using it regularly
is the fact that it won't take arguments to R itself (at least on
Windows):  I need the --internet2 argument to be able to access the net
from R.

Andy

From: Giovanni Petris
 
 Dear R-helpers,
 
 I am planning a course on Statistical Computing and Computational
 Statistics for the Fall semester, aimed at first year Masters students
 in Statistics. Among the topics that I would like to cover are linear
 algebra related to least squares calculations, optimization and
 root-finding, numerical integration, Monte Carlo methods (possibly
 including MCMC), bootstrap, smoothing and nonparametric density
 estimation. Needless to say, the software I will be using is R.
 
 1. Does anybody have a suggestion about a book to follow that covers
(most of) the topics above at a reasonable revel for my audience? 
Are there any on-line publicly-available manuals, lecture notes,
instructional documents that may be useful?
 
 2. I do most of my work in R using Emacs and ESS. That means that I
keep a file in an emacs window and I submit it to R one line at a
time or one region at a time, making corrections and iterating as
needed. When I am done, I just save the file with the last,
working, correct (hopefully!) version of my code. Is there a way of
doing something like that, or in the same spirit, without using
Emacs/ESS? What approach would you use to polish and save your code
in this case? For my course I will be working in a Windows
environment. 

While I am looking for simple and effective solutions that do not
require installing emacs in our computer lab, the answer you
should teach your students emacs/ess on top of R is perfecly
acceptable.

 
 Thank you for your consideration, and thank you in advance for the
 useful replies.
 
 Have a good day,
 Giovanni
 
 -- 
 
 Giovanni Petris  [EMAIL PROTECTED]
 Department of Mathematical Sciences
 University of Arkansas - Fayetteville, AR 72701
 Ph: (479) 575-6324, 575-8630 (fax)
 http://definetti.uark.edu/~gpetris/
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on averaging sets of rows defined by row name

2007-04-20 Thread Liaw, Andy
You might want to check which of the following scales better for the
size of data you have.

## Make up some data to try.
R dat - data.frame(gene=rep(letters[1:3], each=3), s1=runif(9),
s2=runif(9))
R dat
  genes1s2
1a 0.9959172 0.9531052
2a 0.2064497 0.4257022
3a 0.4791100 0.5977923
4b 0.1307096 0.8256453
5b 0.7887983 0.8904983
6b 0.7841745 0.6901540
7c 0.3356583 0.7125086
8c 0.5859311 0.0509323
9c 0.7681325 0.8677725

## Use aggregate():
R aggregate(dat[-1], dat[1], mean)
  genes1s2
1a 0.5604923 0.6588666
2b 0.5678941 0.8020992
3c 0.5632407 0.5437378

## Do it by hand: need a bit more work if there are Nas.
R rowsum(dat[-1], dat[[1]]) / table(dat[[1]])
 s1s2
a 0.5604923 0.6588666
b 0.5678941 0.8020992
c 0.5632407 0.5437378

Andy
 

From: Booman, M
 
 Dear all,
 
 This is my problem: I have a table of gene expression data, 
 where 1st column is gene name, and 2nd -39th columns each are 
 exression data for 38 samples. There are multiple 
 measurements per sample for each gene, so there are multiple 
 rows for each gene name. I want to average these measurements 
 so i end up with one value per sample for each gene name. The 
 output data frame (table.averaged) is further used in other R 
 script. The code I use now (see below) takes 20 secs for each 
 loop, so it takes 45 minutes to average my files of 13500 
 unique genes. Can anyone help me do this faster?
 
 Cheers, marije
 
 Code I use: 
 
 
 table.imputed[,1] - as.character(table.imputed[,1])
 #table.imputed is data.frame,1st column = gene name (class 
 factor), rest of columns = expression data (class numeric)
 
 genesunique - unique(table.imputed[,1])   
 #To make list of unique genes in the set
 
 table.averaged - NULL
   for (j in 1:length(genesunique)) {
  if (j%%100 == 0){
#To report progress
cat(j, genes finished, sep= , fill=TRUE)
}
  
 table.averaged-rbind(table.averaged,givemean(genesunique[j], 
 table.imputed))   #collects all rows of average values and 
 binds them back into one data frame
   }
 
 givemean - function (gene, table.imputed) {
thisgene-table.imputed[table.imputed[,1]==gene,]  
  #make a subset containing only 
 the rows for one gene name
data.frame(gene,t(sapply(thisgene[,2:ncol(thisgene)],mean, 
 na.rm=TRUE))) #calculates average for each sample 
 (column) and outputs one row of average values and the gene name
 }
 
 
 De inhoud van dit bericht is vertrouwelijk en alleen bestemd 
 voor de geadresseerde(n). Anderen dan de geadresseerde mogen 
 geen gebruik maken van dit bericht, het openbaar maken of op 
 enige wijze verspreiden of vermenigvuldigen. Het UMCG kan 
 niet aansprakelijk gesteld worden voor een incomplete 
 aankomst of vertraging van dit verzonden bericht.
 
 The contents of this message are confidential and only 
 intended for the eyes of the addressee(s). Others than the 
 addressee(s) are not allowed to use this message, to make it 
 public or to distribute or multiply this message in any way. 
 The UMCG cannot be held responsible for incomplete reception 
 or delay of this transferred message.
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on averaging sets of rows defined by row name

2007-04-20 Thread Liaw, Andy
Do note that I used dat[1] instead of dat[,1] or dat[[1]] as the second
argument to aggregate():  If dat is a data frame, then dat[1] is also a
data frame with only the first column.  Since data frame is also a list,
dat[1] is a one-component list.
 
My guess is that Tierry didn't try his suggestion, or else he would have
noticed the error.
 
Andy




From: Booman, M [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 20, 2007 10:26 AM
To: Liaw, Andy; r-help@stat.math.ethz.ch
Subject: RE: [R] Help on averaging sets of rows defined by row
name



Thanks for your help everyone!
I had some trouble with the 'aggregate' function because the
'table.impute[,1]' was not a list (which the 'by' argument should be),
and it took a very very long time to coerce it into one. But the
rowmeans method works almost instantly! And I have no problems with NA's
because I used a knn imputer first.


-Original Message-
From: Liaw, Andy [mailto:[EMAIL PROTECTED]
Sent: Fri 4/20/2007 4:09 PM
To: Booman, M; r-help@stat.math.ethz.ch
Subject: RE: [R] Help on averaging sets of rows defined by row
name

You might want to check which of the following scales better for
the
size of data you have.

## Make up some data to try.
R dat - data.frame(gene=rep(letters[1:3], each=3),
s1=runif(9),
s2=runif(9))
R dat
  genes1s2
1a 0.9959172 0.9531052
2a 0.2064497 0.4257022
3a 0.4791100 0.5977923
4b 0.1307096 0.8256453
5b 0.7887983 0.8904983
6b 0.7841745 0.6901540
7c 0.3356583 0.7125086
8c 0.5859311 0.0509323
9c 0.7681325 0.8677725

## Use aggregate():
R aggregate(dat[-1], dat[1], mean)
  genes1s2
1a 0.5604923 0.6588666
2b 0.5678941 0.8020992
3c 0.5632407 0.5437378

## Do it by hand: need a bit more work if there are Nas.
R rowsum(dat[-1], dat[[1]]) / table(dat[[1]])
 s1s2
a 0.5604923 0.6588666
b 0.5678941 0.8020992
c 0.5632407 0.5437378

Andy


From: Booman, M

 Dear all,

 This is my problem: I have a table of gene expression data,
 where 1st column is gene name, and 2nd -39th columns each are
 exression data for 38 samples. There are multiple
 measurements per sample for each gene, so there are multiple
 rows for each gene name. I want to average these measurements
 so i end up with one value per sample for each gene name. The
 output data frame (table.averaged) is further used in other R
 script. The code I use now (see below) takes 20 secs for each
 loop, so it takes 45 minutes to average my files of 13500
 unique genes. Can anyone help me do this faster?

 Cheers, marije

 Code I use:


 table.imputed[,1] - as.character(table.imputed[,1])   
 #table.imputed is data.frame,1st column = gene name (class
 factor), rest of columns = expression data (class numeric)

 genesunique - unique(table.imputed[,1])  
 #To make list of unique genes in the set

 table.averaged - NULL
   for (j in 1:length(genesunique)) {
  if (j%%100 == 0){   
#To report progress
cat(j, genes finished, sep= , fill=TRUE)
}
 
 table.averaged-rbind(table.averaged,givemean(genesunique[j],
 table.imputed))   #collects all rows of average values and
 binds them back into one data frame
   }

 givemean - function (gene, table.imputed) {
thisgene-table.imputed[table.imputed[,1]==gene,] 
  #make a subset containing only
 the rows for one gene name
data.frame(gene,t(sapply(thisgene[,2:ncol(thisgene)],mean,
 na.rm=TRUE))) #calculates average for each sample
 (column) and outputs one row of average values and the gene
name
 }


 De inhoud van dit bericht is vertrouwelijk en alleen bestemd
 voor de geadresseerde(n). Anderen dan de geadresseerde mogen
 geen gebruik maken van dit bericht, het openbaar maken of op
 enige wijze verspreiden of vermenigvuldigen. Het UMCG kan
 niet aansprakelijk gesteld worden voor een incomplete
 aankomst of vertraging van dit verzonden bericht.

 The contents of this message are confidential and only
 intended for the eyes of the addressee(s

Re: [R] How to return more than one variable from function

2007-04-20 Thread Liaw, Andy
From: Vincent Goulet
 
 Le Vendredi 20 Avril 2007 07:46, Julien Barnier a écrit :
  Hi,
 
   I have written a function which computes variance, sd,
   r^2, R^2adj etc. But i am not able to return all of
   them in return statement.
 
  You can return a vector, or a list.
 
  For example :
 
  func - function() {
...
result - list(variance=3, sd=sqrt(3))
return(result)  # you can omit this
  }
 
 Nitpicking and for the record: if you omit the 
 return(result) line, the 
 function will return nothing since it ends with an 
 assignment.

Have you actually checked?  Counterexample:

R f - function(x) y - 2 * x
R f(3)
R (z - f(3))
[1] 6
R f2 - function(x) { y - 2 * x; y }
R f2(3)
[1] 6


 Furthermore, 
 explicit use of return() is never needed at the end of a 
 function. The above 
 snippet is correct, but this is enough:
 
 func - function() {
   ...
   result -list(variance=3, sd=sqrt(3))
   result
 }
 
 But then, why assign to a variable just to return its value? 
 Better still:
 
 func - function() {
   ...
   list(variance=3, sd=sqrt(3))
 }

Or, as has been suggested, if all values to be returned are of the same type, 
just use a (named) vector:

func - function(...) {
...
c(Variance=3, R-squared=0.999)
}

Andy
 
 
 
  a - func()
  a$variance
  a$sd
 
  HTH,
 
  Julien
 
 -- 
   Vincent Goulet, Professeur agrégé
   École d'actuariat
   Université Laval, Québec 
   [EMAIL PROTECTED]   http://vgoulet.act.ulaval.ca
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Udpate R under a proxy

2007-04-20 Thread Liaw, Andy
This is what I just tried (thanks, Dirk!):

Start R and then Sys.putenv(http_proxy, whatever), 
options(download.file.method=wget) doesn't work.

Open up a command prompt, define http_proxy there, then run Rgui.  Set 
options(download.file.method=wget).  This works.

Perhaps you can define http_proxy in Renviron.  I have not tried that.

Andy 

From: justin bem
 
 dear all,
 
 I get internet via a proxy server when I try to
 downlaod package it always fail. Even when i add and
 environnment variable for the http proxy server. I use
 windows XP SP2
 
 Sincerly
 
 Justin BEM
 Elève Ingénieur Statisticien Economiste
 BP 294 Yaoundé.
 Tél (00237)9597295.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] general question about plotting multiple regression results [Broadcast]

2007-04-19 Thread Liaw, Andy
I suspect you'll greatly benefit a read of Prof. Fox's book(s) on
regression models, as well as making use of his car package.  You may
want to read up on partial residual plots and partial regression plots.

Andy

From: Simon Pickett
 
 Hi all,
 
 I have been bumbling around with r for years now and still 
 havent come up with a solution for plotting reliable graphs 
 of relationships from a linear regression.
 
 Here is an example illustrating my problem
 
 1.I do a linear regression as follows
 
 summary(lm(n.day13~n.day1+ffemale.yell+fmale.yell+fmale.chroma
 ,data=surv))
 
 which gives some nice sig. results
 
 Coefficients:
  Estimate Std. Error t value Pr(|t|)
 (Intercept)  -0.739170.43742  -1.690 0.093069 .
 n.day11.004600.05369  18.711   2e-16 ***
 ffemale.yell  0.224190.06251   3.586 0.000449 ***
 fmale.yell0.258740.06925   3.736 0.000262 ***
 fmale.chroma  0.235250.11633   2.022 0.044868 *
 
 2. I want to plot the effect of ffemale.yell, fmale.yell 
 and fmale.chroma on my response variable.
 
 So, I either plot the raw values (which is fine when there is 
 a very strong relationship) but what if I want to plot the 
 effects from the model?
 
 In this case I would usually plot the fitted values values 
 against the raw values of x... Is this the right approach?
 
 fit-fitted(lm(n.day13~n.day1+ffemale.yell+fmale.yell+fmale.ch
 roma,data=fsurv1))
 
 plot(fit~ffemale.yell)
 
 #make a dummy variable across the range of x 
 x-seq(from=min(fsurv1$ffemale.yell),to=max(fsurv1$ffemale.yel
 l), length=100)
 
 #get the coefficients and draw the line
 co-coef(lm(fit~ffemale.yell,data=fsurv1))
 y-(co[2]*x)+co[1]
 lines(x,y, lwd=2)
 
 This often does the trick but for some reason, especially 
 when my model has many terms in it or when one of the 
 independent variables is only significant when the other 
 independent variables are in the equation, it gives me strange lines.
 
 Please can someone show me the light?
 
 Thanks in advance,
 
 Simon.
 
 
 
 
 
 
 Simon Pickett
 PhD student
 Centre For Ecology and Conservation
 Tremough Campus
 University of Exeter in Cornwall
 TR109EZ
 Tel 01326371852
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R in cron job: X problems

2007-04-19 Thread Liaw, Andy
This is in the FAQ, if I remember correctly...  However, alternatively:

As Jeff Horner recently pointed out on the list, the Cairo package is a
good way of generating png without needing an X display.  You may want
to look into that.  I've just installed cairo on our CentOS boxes and
the Cairo package from CRAN.

Andy 

From: Mark Liberman
 
 I'd like to use an R CMD BATCH script as part of a chron job 
 that is set up to run every hour.
 
 The trouble is that the script creates a graphical output in 
 a file via png(), and apparently this in turn works through X.
 
 When cron invokes the job, no X server is available -- I 
 suppose that the DISPLAY variable is not set -- and so R 
 exits with an error message in the output file. (If I run the 
 same script in an environment where an X server is properly 
 available, it works as I want it to.)
 
 I tried setting DISPLAY to thecomputername:0.0 (where 
 thecomputername
 is the X.Y.Z form of the computer's name as names it for ssh 
 etc.), but that didn't work.
 
 Any advice about how to persuade the graphics subsystem to 
 bypass X, or how to set DISPLAY in a safe way to run in a cron job?
 
 This is a linux system (a recent RedHat server system) with R 2.2.1.
 
 Thanks,
 
 Mark Liberman
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] installing new packages

2007-04-13 Thread Liaw, Andy
See if 2.19 The Internet download functions fail in the R for Windows
FAQ helps.

Andy 

From: Bill Shipley
 
 Hello,
 
  
 
 I have just installed the newest version of R (2.4.1) for 
 Windows XP.  I can no longer install new packages.  When 
 trying to connect to a server (I have tried several) I get 
 the following message:
 
  
 
  chooseCRANmirror()
 
 Error in open.connection(file, r) : unable to open connection
 
 In addition: Warning message:
 
 unable to connect to 'cran.r-project.org' on port 80.
 
  
 
 Have other people had the same problem with this version, or 
 is it unique to my computer?  Can someone suggest a solution?
 
 Thanks.
 
  
 
 Bill Shipley
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting list of vectors with list of (boolean) vectors?

2007-04-12 Thread Liaw, Andy
From: Marc Schwartz
 
 On Thu, 2007-04-12 at 18:12 +0200, Johannes Graumann wrote:
  Dear Rologists,
  
  I'm stuck with this. How would you do this efficiently:
  
   aPGI
  [[1]]
  [1] 864  5576
  
  
   aPGItest
  [[1]]
  [1]  TRUE FALSE
  
   result - [magic box involving subset)
  
   result
  [[1]]
  [1] 864
  
  Thanks for any hints,
  
  Joh
 
 
  lapply(seq(along = length(aPGI)), function(x) 
  aPGI[[x]][aPGItest[[x]]])
 [[1]]
 [1] 864

Alternatively:

R mapply([, aPGI, aPGItest, SIMPLIFY=FALSE)
[[1]]
[1] 864

Cheers,
Andy

 
 
 I think that this should be a generic solution for multiple 
 (but common) levels in each list.
 
 HTH,
 
 Marc Schwartz
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random Forest Imputations [Broadcast]

2007-04-11 Thread Liaw, Andy
Please provide the information the posting guide asks (version of R, packages 
used, version of package used, etc).  There are no yaImpute() or yai() 
functions in the randomForest package.
 
Andy



From: [EMAIL PROTECTED] on behalf of Ricky Jacob
Sent: Wed 4/11/2007 5:55 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Random Forest Imputations [Broadcast]



Dear All,
I am not able to run the random forest with my dataset..

X- 280 records with satellite data(28 columns) - B1min, b1max, b1std etc..

y-  280 records with 3 columns -  TotBasal Area, Stem density and Volume

yref - y[1:230,] #Keeping 1st 230 records as reference records



want to set 0 to y values for records 231 to 280..



yimp - y[231:280,] #records for which we want to impute the basal area,
stem density and volume



mal1 - yai(x=x, y=yref, method=mahalanobis, k=1, noRefs = TRUE)  # This
works fine for mahalanobis, msn, gnn, raw and Euclidean

Want to do a similar thing with random forest where the 1st 230 records
alone should be used for calculating Nearest Neighbours for the records with
number 231 to 280..
What needs to be done..  Went through the yaImpute document.. but all i
could do without any error message was to have NN generated using the yai()
where all 280 records have been used for finding nearest neighbour.

Regards
Ricky

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random Forest Imputations [Broadcast] [Broadcast]

2007-04-11 Thread Liaw, Andy
The package has a doc/ subdirectory (in the pre-compiled package, or
inst/doc in the source package), which contains yaImputePaper.pdf.  Page
9 of that document may be of some help to you.  This is the first time
I've seen this package, so can't help you much there.  It looks like the
package authors would like me to add some feature to the randomForest
package (which I maintain).  I'll look into that.
 
Andy




From: Ricky Jacob [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 11, 2007 7:11 AM
To: Liaw, Andy
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Random Forest Imputations [Broadcast]
[Broadcast]


I am currently using R 2.4.1 version.
Am using the yaImpute package for k-NN imputation..
http://forest.moscowfsl.wsu.edu/gems/yaImpute.pdf
 
 
In yaImpute, i am using the yai function which uses randomForest
as a method for finding out the k-Nearest Neighbours..
http://cran.r-project.org/doc/packages/yaImpute.pdf 
 
 
With the help iof the example given I was able to use the other
methods available.  
from the document, and the MoscowMtStJoe exampe, is similar to
the work i am trying to do.
 
But the y variable needs to be entered in the form of a factor
for random forest.
 
what can be done here?!


 
On 4/11/07, Liaw, Andy [EMAIL PROTECTED] wrote: 

Please provide the information the posting guide asks
(version of R, packages used, version of package used, etc).  There are
no yaImpute() or yai() functions in the randomForest package. 
 
Andy



From: [EMAIL PROTECTED] on behalf of
Ricky Jacob 
Sent: Wed 4/11/2007 5:55 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Random Forest Imputations [Broadcast] 

 

Dear All,
I am not able to run the random forest with my dataset..

X- 280 records with satellite data(28 columns) - B1min,
b1max, b1std etc..

y-  280 records with 3 columns -  TotBasal Area, Stem
density and Volume 

yref - y[1:230,] #Keeping 1st 230 records as reference
records



want to set 0 to y values for records 231 to 280..



yimp - y[231:280,] #records for which we want to impute
the basal area, 
stem density and volume



mal1 - yai(x=x, y=yref, method=mahalanobis, k=1,
noRefs = TRUE)  # This
works fine for mahalanobis, msn, gnn, raw and Euclidean

Want to do a similar thing with random forest where the
1st 230 records 
alone should be used for calculating Nearest Neighbours
for the records with
number 231 to 280..
What needs to be done..  Went through the yaImpute
document.. but all i
could do without any error message was to have NN
generated using the yai() 
where all 280 records have been used for finding nearest
neighbour.

Regards
Ricky

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code. 







--
Notice:  This e-mail message, together with any
attachments, contains
information of Merck  Co., Inc. (One Merck Drive,
Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates (which may
be known
outside the United States as Merck Frosst, Merck Sharp 
Dohme or MSD
and in Japan, as Banyu - direct contact information for
affiliates is 
available at http://www.merck.com/contact/contacts.html)
that may be 
confidential, proprietary copyrighted and/or legally
privileged. It is 
intended solely for the use of the individual or entity
named on this 
message. If you are not the intended recipient, and have
received this 
message in error, please notify us immediately by reply
e-mail and then 
delete

Re: [R] Reasons to Use R [Broadcast]

2007-04-11 Thread Liaw, Andy
From: Douglas Bates
 
 On 4/10/07, Wensui Liu [EMAIL PROTECTED] wrote:
  Greg,
  As far as I understand, SAS is more efficient handling large data 
  probably than S+/R. Do you have any idea why?
 
 SAS originated at a time when large data sets were stored on 
 magnetic tape and the only reasonable way to process them was 
 sequentially.
 Thus most statistics procedures in SAS act as filters, 
 processing one record at a time and accumulating summary 
 information.  In the past SAS performed a least squares fit 
 by accumulating the crossproduct of [X:y] and then using the 
 using the sweep operator to reduce that matrix. For such an 
 approach the number of observations does not affect the 
 amount of storage required.  Adding observations just 
 requires more time.
 
 This works fine (although there are numerical disadvantages 
 to this approach - try mentioning the sweep operator to an 
 expert in numerical linear algebra - you get a blank stare) 

For those who stared blankly at the above:  The sweep operator is 
just a facier version of the good old Gaussian elimination...

Andy

 as long as the operations that you wish to perform fit into 
 this model.  Making the desired operations fit into the model 
 is the primary reason for the awkwardness in many SAS analyses.
 
 The emphasis in R is on flexibility and the use of good 
 numerical techniques - not on processing large data sets 
 sequentially.  The algorithms used in R for most least 
 squares fits generate and analyze the complete model matrix 
 instead of summary quantities.  (The algorithms in the biglm 
 package are a compromise that work on horizontal sections of 
 the model matrix.)
 
 If your only criterion for comparison is the ability to work 
 with very large data sets performing operations that can fit 
 into the filter model used by SAS then SAS will be a better 
 choice.  However you do lock yourself into a certain set of 
 operations and you are doing it to save memory, which is a 
 commodity that decreases in price very rapidly.
 
 As mentioned in other replies, for many years the majority of 
 SAS uses are for data manipulation rather than for 
 statistical analysis so the filter model has been modified in 
 later versions.
 
 
 
 
 
  On 4/10/07, Greg Snow [EMAIL PROTECTED] wrote:
-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Bi-Info 
(http://members.home.nl/bi-info)
Sent: Monday, April 09, 2007 4:23 PM
To: Gabor Grothendieck
Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
Subject: Re: [R] Reasons to Use R
  
   [snip]
  
So what's the big deal about S using files instead of 
 memory like 
R. I don't get the point. Isn't there enough swap space for S? 
(Who cares
anyway: it works, isn't it?) Or are there any problems 
 with S and 
large datasets? I don't get it. You use them, Greg. So 
 you might 
discuss that issue.
   
Wilfred
   
   
  
   This is my understanding of the issue (not anything official).
  
   If you use up all the memory while in R, then the OS will start 
   swapping memory to disk, but the OS does not know what parts of 
   memory correspond to which objects, so it is entirely 
 possible that 
   the chunk swapped to disk contains parts of different 
 data objects, 
   so when you need one of those objects again, everything 
 needs to be 
   swapped back in.  This is very inefficient.
  
   S-PLUS occasionally runs into the same problem, but since it does 
   some of its own swapping to disk it can be more efficient by 
   swapping single data objects (data frames, etc.).  Also, since 
   S-PLUS is already saving everything to disk, it does not actually 
   need to do a full swap, it can just look and see that a 
 particular 
   data frame has not been used for a while, know that it is already 
   saved on the disk, and unload it from memory without 
 having to write it to disk first.
  
   The g.data package for R has some of this functionality 
 of keeping 
   data on the disk until needed.
  
   The better approach for large data sets is to only have 
 some of the 
   data in memory at a time and to automatically read just the parts 
   that you need.  So for big datasets it is recommended to have the 
   actual data stored in a database and use one of the database 
   connection packages to only read in the subset that you 
 need.  The 
   SQLiteDF package for R is working on automating this 
 process for R.  
   There are also the bigdata module for S-PLUS and the 
 biglm package 
   for R have ways of doing some of the common analyses 
 using chunks of 
   data at a time.  This idea is not new.  There was a 
 program in the 
   late 1970s and 80s called Rummage by Del Scott (I guess 
 technically it still exists, I have a copy on a 5.25
   floppy somewhere) that used the approach of specify the model you 
   wanted to fit first, then specify the data file.  Rummage 
 would then 
   figure out which 

Re: [R] Reasons to Use R [Broadcast]

2007-04-09 Thread Liaw, Andy
I've probably been away from SAS for too long... we've recently tried to
get SAS on our 64-bit Linux boxes (because SAS on PC is not sufficient
for some of my colleagues who need it).  I was shocked by the quote for
our 28-core Scyld cluster--- the annual fee was a few times the total
cost of our hardware.  We ended up buying a new quad 3GHz Opterons box
with 32GB ram just so that the fee for SAS on such a box would be more
tolerable.  It just boggles my mind that the right to use SAS for a year
is about the price of a nice four-bedroom house (near SAS Institute!).
I don't understand people who rather pay that kind of price for the
software, instead of spending the money on state-of-the-art hardware and
save more than a bundle.

Just my $0.02...
Andy

From: Jorge Cornejo-Donoso
 
 I have a Dell with 2 Intel XEON 3.0 procesors and 2GB of ram
 The problem is the DB size. 
 
 -Mensaje original-
 De: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
 Enviado el: Lunes, 09 de Abril de 2007 11:28
 Para: Jorge Cornejo-Donoso
 CC: r-help@stat.math.ethz.ch
 Asunto: Re: [R] Reasons to Use R
 
 Have you tried 64 bit machines with larger memory or do you 
 mean that you can't use R on your current machines?
 
 Also have you tried S-Plus?  Will that work for you? The 
 transition from that to R would be less than from SAS to R.
 
 On 4/9/07, Jorge Cornejo-Donoso [EMAIL PROTECTED] wrote:
  tha s9ze of db is an issue with R. We are still using SAS because R 
  can't handle own db, and of couse we don't want to sacrify 
 resolution, 
  because the data collection is expensive (at least in fisheries and 
  oceagraphy), so.. I think that R need to improve the use of big DBs.
  Now I only can use R for graph preparation and some data 
 analisis, but 
  we can't do the main work on R, abd that is really sad.
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset [Broadcast]

2007-03-26 Thread Liaw, Andy
From: Thomas Lumley
 
 On Mon, 26 Mar 2007, Marc Schwartz wrote:
 
  Sergio,
 
  Please be sure to cc: the list (ie. Reply to All) with follow up 
  questions.
 
  In this case, you would use %in% with a negation:
 
  NewDF - subset(DF, (var1 == 0)  (var2 == 0)  (!var3 %in% 2:3))
 
 
 Probably a typo: should be !(var3 %in% 2:3) rather than (!var 
 %in% 2:3)

I used to think so, but found I didn't need the parens:

R a - 1:3; b - c(1, 3, 5)
R ! a %in% b
[1] FALSE  TRUE FALSE
R ! (a %in% b)
[1] FALSE  TRUE FALSE

Andy

   -thomas
 
  See ?%in% for more information.
 
  HTH,
 
  Marc
 
  On Mon, 2007-03-26 at 17:30 +0200, Sergio Della Franca wrote:
  Ok, this run correctly.
 
  Another question for you:
 
  I want to put more than one condition for var3, i.e.:
  I like to create a subset when:
   - var1=0
   - var2=0
   - var3 is different from 2 and from 3.
 
  Like you suggested, i perform this code:
  NewDF - subset(DF, (var1 == 0)  (var2 == 0)  (var 3 != 
 2))  (var
  3 != 3))
 
  There is a method to combine (var 3 != 2))  (var 3 != 3)) in one 
  condition?
 
  Thank you.
 
  Sergio
 
 
 
  2007/3/26, Marc Schwartz [EMAIL PROTECTED]:
  On Mon, 2007-03-26 at 17:02 +0200, Sergio Della 
 Franca wrote:
  Dear R-Helpers,
 
  I want to make a subset from my data set.
 
  I'd like to perform different condition for subset.
 
  I.e.:
 
  I like to create a subset when:
  - var1=0
  - var2=0
  - var3 is different from 2.
 
  How can i develop a subset under this condition?
 
  Thank you in advance.
 
  Sergio Della Franca.
 
  See ?subset
 
  Something along the lines of the following should work:
 
  NewDF - subset(DF, (var1 == 0)  (var2 == 0)  
 (var 3 != 0))
 
  HTH,
 
  Marc Schwartz
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 Thomas Lumley Assoc. Professor, Biostatistics
 [EMAIL PROTECTED] University of Washington, Seattle
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] frequency tables and sorting by rowSum

2007-03-24 Thread Liaw, Andy
1. This is probably overkill, but works:
 
 as.data.frame(table(as.data.frame(m)))
  V1 V2 V3 Freq
1  0  0  00
2  1  0  02
3  0  1  03
4  1  1  00
5  0  0  11
6  1  0  10
7  0  1  10
8  1  1  10

You can easily get rid of 0-frequency rows afterward.
 
2. Not sure what you want, but guessing something like:
 
m.sorted - m[order(rowSums(m)), order(colSums(m))]
 
Andy



From: [EMAIL PROTECTED] on behalf of Stefan Nachtnebel
Sent: Sat 3/24/2007 8:41 AM
To: r-help@stat.math.ethz.ch
Subject: [R] frequency tables and sorting by rowSum [Broadcast]



Dear list,

I have some trouble generating a frequency table over a number of vectors.
Creating these tables over simple numbers is no problem with table()

 table(c(1,1,1,3,4,5))

1 3 4 5
3 1 1 1


, but how can i for example turn:
0 1 0
0 0 1
0 1 0
1 0 0
0 1 0
1 0 0

into

0 0 1 1
1 0 0 2
0 1 0 3

My second problem is, sorting rows and columns of a matrix by the 
rowSums/colSums.
I did it this way, but i think there should be a more efficient way:

sortRowCol-function(taus) {
  swaprow - function(rsum) {
taus[(rowSums(taus)==rsum),]
  }
  for( i in 1:2 )
 taus-sapply(sort(rowSums(taus)),swaprow)
}

thanks in advantage, Stefan Nachtnebel

--
Feel free - 5 GB Mailbox, 50 FreeSMS/Monat ...
Jetzt GMX ProMail testen: www.gmx.net/de/go/mailfooter/promail-out

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Get home directory and simple I/O

2007-03-23 Thread Liaw, Andy
From: Gabor Grothendieck
 
 See:
 
 ?R.home

That's not what Alberto wanted:  It gives the location of the R
installation, not where user's home directory is.  AFAIK Windows does
not set the HOME environment variable by default.

 ?dput
 
 On 3/23/07, Alberto Monteiro [EMAIL PROTECTED] wrote:
  Is there any generic function that gets the home directory? This 
  should return /home/user in Linux and x:/Documents and 
  Settings/user (or whatever) in Windows XP.
 
  Another (unrelated) question: what is the _simplest_ way to 
 read and 
  write R variables to/from files such that they are stored in a 
  human-readable but R-like form? For example, if (say), x is 
 a vector 
  defined as x - c(1, 2, 3), can I write (and read) x as a file with 
  just one line, namely: c(1, 2, 3) ?
 
  Alberto Monteiro
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] objects of class matrix and mode list? [Broadcast]

2007-03-23 Thread Liaw, Andy
It may help to (re-)read ?sapply a bit more in detail.  Simplification
is done only if it's possible, and what possible means is defined
there.

A list is a vector whose elements can be different objects, but a vector
nonetheless.  Thus a list can have dimensions.  E.g.,

R a - list(1, 1:2, 3, c(abc, def))
R dim(a) - c(2, 2)
R a
 [,1]  [,2]   
[1,] 1 3  
[2,] Integer,2 Character,2

That sometimes can be extremely useful (not like the example above!).

Andy 

From: Stephen Tucker
 
 Hello everyone,
 
 I cannot seem to find information about objects of class 
 matrix and mode
 list, and how to handle them (apart from flattening the 
 list). I get this
 type of object from using sapply(). Sorry for the long 
 example, but the code
 below illustrates how I get this type of object. Is anyone aware of
 documentation regarding this object?
 
 Thanks very much,
 
 Stephen
 
 = begin example 
 
 # I am just making up a fake data set
 df - data.frame(Day=rep(1:3,each=24),Hour=rep(1:24,times=3),
  Name1=rnorm(24*3),Name2=rnorm(24*3))
 
 # define a function to get a set of descriptive statistics
 tmp - function(x) {
   # this function will accept a data frame
   # and return a 1-row data frame of
   # max value, colname of max, min value, and colname of min
   return(data.frame(maxval=max(apply(x,2,max)),
 maxloc=names(x)[which.max(apply(x,2,max))],
 minval=min(apply(x,2,min)),
 minloc=names(x)[which.min(apply(x,2,min))]))
 }
 
 # Now applying function to data:
 # (1) split the data table by Day with split()
 # (2) apply the tmp function defined above to each data frame from (1)
 # using lapply()
 # (3) transpose the final matrix and convert it to a data frame
 # with mixed characters and numbers
 # using as.data.frame(), lapply(), and type.convert()
 
  final - 
 as.data.frame(lapply(as.data.frame(t(sapply(split(df[,-c(1:2)],
 +   
 f=df$Day),tmp))),
 +   type.convert,as.is=TRUE))
 Error in type.convert(x, na.strings, as.is, dec) : 
   the first argument must be of mode character
 
 I thought sapply() would give me a data frame or matrix, which I would
 transpose into a character matrix, to which I can apply type.convert()
 and get the same matrix as what I would get from these two lines (Fold
 function taken from Gabor's post on R-help a few years ago):
 
 Fold - function(f, x, L) for(e in L) x - f(x, e)
 final2 - Fold(rbind,vector(),lapply(split(df[,-c(1:2)],f=day),tmp))
 
  print(c(class(final2),mode(final2)))
 [1] data.frame list  
 
 
 However, by my original method, sapply() gives me a matrix 
 with mode, list
 
 intermediate1 - sapply(split(df[,-c(1:2)],f=df$Day),tmp)
  print(c(class(intermediate1),mode(intermediate1)))
 [1] matrix list  
 
 Transposing, still a matrix with mode list, not character:
 
 intermediate2 - t(sapply(split(df[,-c(1:2)],f=day),tmp))
  print(c(class(intermediate2),mode(intermediate2)))
 [1] matrix list  
 
 Unclassing gives me the same thing...
 
  print(c(class(unclass(intermediate2)),mode(unclass(intermediate2
 [1] matrix list  
 
 
 
 
  
 __
 __
 Be a PS3 game guru.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to get lsmeans?

2007-03-22 Thread Liaw, Andy
 
   numbers of observations in the different levels of the 
 factors that 
   are held constant).
  
   The obstacle to computing either least-squares means or effect
  displays
   in R
   via predict() is that predict() wants factors in the new 
 data to 
   be set to particular levels. The effect() function in the effects 
   package bypasses
   predict() and works directly with the model matrix, 
 averaging over 
   the columns that pertain to a factor (and reconstructing 
   interactions as necessary). As mentioned, this has the effect of 
   setting the factor to its proportional distribution in the data. 
   This approach also has the advantage of being invariant 
 with respect 
   to the choice of contrasts for a factor.
  
   The only convenient way that I can think of to implement 
   least-squares means in R would be to use deviation-coded 
 regressors 
   for a factor (that is,
   contr.sum) and then to set the columns of the model matrix for the
   factor(s)
   to be averaged over to 0. It may just be that I'm having 
 a failure 
   of imagination and that there's a better way to proceed. I've not 
   implemented this solution because it is dependent upon 
 the choice of 
   contrasts and because I don't see a general advantage to it, but 
   since the issue has come up several times now, maybe I 
 should take a 
   crack at it. Remember that I want this to work more 
 generally, not 
   just for levels of factors, and not just for linear models.
  
   Brian is quite right in mentioning that he suggested some time ago
  that
   I
   use critical values of t rather than of the standard normal 
   distribution for producing confidence intervals, and I 
 agree that it 
   makes sense to do so in models in which the dispersion is 
 estimated. 
   My only excuse for not
  yet
   doing this is that I want to undertake a more general revision of 
   the effects package, and haven't had time to do it. There are 
   several changes that I'd like to make to the package. For 
 example, I 
   have results for multinomial and proportional odds logit models 
   (described in a paper
  by
   me
   and Bob Andersen in the 2006 issue of Sociological 
 Methodology) that 
   I want to incorporate, and I'd like to improve the 
 appearance of the 
   default graphs. But Brian's suggestion is very 
 straightforward, and 
   I guess that I shouldn't wait to implement it; I'll do so 
 very soon.
  
   Regards,
John
  
   
   John Fox
   Department of Sociology
   McMaster University
   Hamilton, Ontario
   Canada L8S 4M4
   905-525-9140x23604
   http://socserv.mcmaster.ca/jfox
   
  
-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
 Prof Brian 
Ripley
Sent: Wednesday, March 21, 2007 12:03 PM
To: Chuck Cleland
Cc: r-help
Subject: Re: [R] how to get lsmeans?
   
On Wed, 21 Mar 2007, Chuck Cleland wrote:
   
 Liaw, Andy wrote:
 I verified the result from the following with output 
 from JMP 6
  on
 the same data (don't have SAS: don't need it):

 set.seed(631)
 n - 100
 dat - data.frame(y=rnorm(n), A=factor(sample(1:2, n,
replace=TRUE)),
   B=factor(sample(1:2, n, replace=TRUE)),
   C=factor(sample(1:2, n, replace=TRUE)),
   d=rnorm(n))
 fm - lm(y ~ A + B + C + d, dat) ## Form a data 
 frame of points 
 to predict: all
combinations of the ##
 three factors and the mean of the covariate.
 p - data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] -
  lapply(p,
 factor) p - cbind(p, d=mean(dat$d)) p -
cbind(yhat=predict(fm, p),
 p) ## lsmeans for the three factors:
 with(p, tapply(yhat, A, mean))
 with(p, tapply(yhat, B, mean))
 with(p, tapply(yhat, C, mean))

  Using Andy's example data, these are the LSMEANS and
intervals I get
 from SAS:

 Ay LSMEAN  95% Confidence Limits
 1   -0.071847   -0.387507 0.243813
 2   -0.029621   -0.342358 0.283117

 By LSMEAN  95% Confidence Limits
 1   -0.104859   -0.397935 0.188216
 20.003391   -0.333476 0.340258

 Cy LSMEAN  95% Confidence Limits
 1   -0.084679   -0.392343 0.222986
 2   -0.016789   -0.336374 0.302795

  One way of reproducing the LSMEANS and intervals 
 from SAS using
 predict() seems to be the following:

 dat.lm - lm(y ~ A + as.numeric(B) + as.numeric(C) + d,
data = dat)
 newdat - 
 expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d))
 cbind(newdat, predict(dat.lm, newdat, interval=confidence))
  A   B   C  d fitlwr   upr
 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128
 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165

Re: [R] how to get lsmeans?

2007-03-21 Thread Liaw, Andy
I verified the result from the following with output from JMP 6 on the
same data (don't have SAS: don't need it):

set.seed(631)
n - 100
dat - data.frame(y=rnorm(n), A=factor(sample(1:2, n, replace=TRUE)),
  B=factor(sample(1:2, n, replace=TRUE)),
  C=factor(sample(1:2, n, replace=TRUE)),
  d=rnorm(n))
fm - lm(y ~ A + B + C + d, dat)
## Form a data frame of points to predict: all combinations of the
## three factors and the mean of the covariate.
p - data.frame(expand.grid(A=1:2, B=1:2, C=1:2))
p[] - lapply(p, factor)
p - cbind(p, d=mean(dat$d))
p - cbind(yhat=predict(fm, p), p)
## lsmeans for the three factors:
with(p, tapply(yhat, A, mean))
with(p, tapply(yhat, B, mean))
with(p, tapply(yhat, C, mean))

Andy 

From: Xingwang Ye
 
 Dear all, 
   
 I search the mail list about this topic and learn that no 
 simple way is available to get lsmeans in R as in SAS.
 Dr.John Fox and Dr.Frank E Harrell have given very useful 
 information about lsmeans topic.
 Dr. Frank E Harrell suggests not to think about lsmeans, 
 just to think about what predicted values wanted
 and to use the predict function. However, after reading 
 the R help file for a whole day, I am still unclear how to do it.
 Could some one give me a hand? 
  
 for example:
   
 A,B and C are binomial variables(factors); d is a continuous 
 variable ; The response variable Y is  a continuous variable too.  
 
 To get lsmeans of Y according to A,B and C, respectively, in 
 SAS, I tried proc glm data=a;  class A B C;  model Y=A B C d; 
  lsmeans A B C/cl; run;  
 
 In R, I tried this:  
  library(Design)
  ddist-datadist(a)
  options(datadist=ddist)
  f-ols(Y~A+B+C+D,data=a,x=TRUE,y=TRUE,se.fit=TRUE)  
 
 then how to get the lsmeans for A, B, and C, respectively 
 with predict function?
 
  
 
 Best wishes
 yours, sincerely 
 Xingwang Ye
 PhD candidate 
 Research Group of Nutrition Related Cancers and Other Chronic 
 Diseases  
 Institute for Nutritional Sciences,  
 Shanghai Institutes of Biological Sciences, 
 Chinese Academy of Sciences 
 P.O.Box 32 
 294 Taiyuan Road 
 Shanghai 200031 
 P.R.CHINA
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package:AlgDesign and .Random.seed [Broadcast]

2007-03-21 Thread Liaw, Andy
From: Michael Kubovy
 
 On Mar 21, 2007, at 4:16 AM, Uwe Ligges wrote:
 
  Michael Kubovy wrote:
  Dear r-helpers,
  Could you please help me solve the following problem: When I run
  require(AlgDesign)
  trt - LETTERS[1:5]
  blk - 10
  trtblk - 3
  BIB - optBlock(~., withinData = trt, blocksizes = 
 rep(trtblk, blk)) 
  In response to the last command, R complains:
  Error in optBlock(~., withinData = trt, blocksizes = rep(trtblk,
  blk)) :
 object .Random.seed not found
  The documentation of optBlock() in AlgDesign doesn't say that I   
  needed to set .Random.seed. I thought it was initiated 
 automatically  
  at the beginning of a session. What am I missing?
 
 
  The first line in that function is
  seed - .Random.seed
  but .Random.seed is generated at the first use of R's RNG, hence  
  maybe later. This means the function contains a bug which you  
  should report to the package maintainer, please.
 
  Best,
  Uwe Ligges
 
 Bob Wheeler's response:
 
  From: Bob Wheeler [EMAIL PROTECTED]
  Date: March 21, 2007 9:19:29 AM EDT
  To: Michael Kubovy [EMAIL PROTECTED]
  Subject: Re:
 
  Each workspace in R requires you to set a random seed to 
 start. You  
  have not done this. It is an R artifact, and has nothing to 
 do with  
  AlgDesign.

I do not agree with that assessment (well, it's just my $0.02 anyway).
I don't need a random seed unless I'm doing computations that requires
pseudo-random numbers.  There are plenty of times I use R without
needing random seed.

None of the builtin RNGs in R requires explcit seed setting, nor does
any of the ones in the contributed packages that I know of.  Thus I
would claim that's a flaw in AlgDesign.

Andy

  -- 
  Bob Wheeler --- http://www.bobwheeler.com/
 ECHIP, Inc. --- Randomness comes in bunches.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Select the last two rows by id group

2007-03-20 Thread Liaw, Andy
Something like the following should work:

last.n - function(x, n) {
last - nrow(x)
x[(last - n + 1):last, , drop=FALSE]
}
## Example: get the last two rows.
do.call(rbind, lapply(split(score, score$id), last.n, 2)) 

You might want to add a check in last.n() to make sure that there are at
least n rows to extract.

Andy

From: Lauri Nikkinen
 
 Hi R-users,
 
 Following this post 
 http://tolstoy.newcastle.edu.au/R/help/06/06/28965.html , how 
 do I get last two rows (or six or ten) by id group out of the 
 data frame? Here the example gives just the last row.
 
 Sincere thanks,
 Lauri
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bad points in regression [Broadcast]

2007-03-16 Thread Liaw, Andy
(My turn on the soapbox ...)

I'd like to add a bit of caveat to Bert's view.  I'd argue (perhaps even
plead) that robust/resistant procedures be used with care.  They should
not be used as a shortcut to avoid careful analysis of data.  I recalled
that in my first course on regression, the professor made it clear that
we're fitting models to data, not the other way around.  When the model
fits badly to (some of the) the data,  do examine and think carefully
about what happened.  Verify that bad data are indeed bad, instead of
using statistical criteria to make that judgment.  A scientific
colleague reminded me of this point when I tried to sell him the idea of
robust/resistant methods:  Don't use these methods as a crutch to stand
on badly run experiments (or poorly fitted models).

Cheers,
Andy

From: Bert Gunter
 
 (mount soapbox...)
 
 While I know the prior discussion represents common practice, 
 I would argue
 -- perhaps even plead -- that the modern(?? 30 years old 
 now) alternative of robust/resistant estimation be used, 
 especially in the readily available situation of 
 least-squares regression. RSiteSearch(Robust) will bring up 
 numerous possibilities.rrcov and robustbase are at least two 
 packages devoted to this, but the functionality is available 
 in many others (e.g.
 rlm() in MASS).
 
 Bert Gunter
 Genentech Nonclinical Statistics
 South San Francisco, CA 94404
 650-467-7374
 
 
 
 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Ted Harding
 Sent: Friday, March 16, 2007 6:44 AM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] Bad points in regression
 
 On 16-Mar-07 12:41:50, Alberto Monteiro wrote:
  Ted Harding wrote:
  
  alpha - 0.3
  beta - 0.4
  sigma - 0.5
  err - rnorm(100)
  err[15] - 5; err[25] - -4; err[50] - 10 x - 1:100 y 
 - alpha + 
  beta * x + sigma * err ll - lm(y ~ x)
  plot(ll)
  
  ll is the output of a linear model fiited by lm(), and so 
 has several 
  components (see ?lm in the section Value), one of which is 
  residuals (which can be abbreviated to res).
  
  So, in the case of your example,
  
which(abs(ll$res)2)
15 25 50
  
  extracts the information you want (and the 2 was inspired by 
  looking at the residuals plot from your plot(ll)).
 
  Ok, but how can I grab those points _in general_? What is the 
  criterium that plot used to mark those points as bad points?
 
 Ahh ... ! I see what you're after. OK, look at the plot 
 method for lm():
 
 ?plot.lm
   ## S3 method for class 'lm':
   plot(x, which = 1:4,
 caption = c(Residuals vs Fitted, Normal Q-Q plot,
   Scale-Location plot, Cook's distance plot),
   panel = points,
   sub.caption = deparse(x$call), main = ,
   ask = prod(par(mfcol))  length(which)  dev.interactive(),
   ...,
   id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75)
 
 
 where (see further down):
 
   id.n: number of points to be labelled in each plot, starting with
 the most extreme.
 
 and note, in the default parameter-values listing above:
 
   id.n = 3
 
 Hence, the 3 most extreme points (according to the criterion 
 being plotted in each plot) are marked in each plot.
 
 So, for instance3, try
 
   plot(ll,id.n=5)
 
 and you will get points 10,15,25,28,50. And so on. But that 
 pre-supposes that you know how many points are exceptional.
 
 
 What is meant by extremeis not stated in the help page 
 ?plot.lm, but can be identified by inspecting the code for 
 plot.lm(), which you can see by entering
 
   plot.lm
 
 In your example, if you omit the line which assigns anomalous 
 values to err[15[, err[25] and err[50], then you are likely 
 to observe that different points get identified on different 
 plots. For instance, I just got the following results for the 
 default id.n=3:
 
 [1] Residuals vs Fitted:   41,53,59
 [2] Standardised Residuals:41,53,59
 [3] sqrt(Stand Res) vs Fitted: 41,53,59
 [4] Cook's Distance:   59,96,97
 
 
 There are several approaches (with somewhat different 
 outcomes) to identifying outliers. If you apply one of 
 these, you will probably get the identities of the points anyway.
 
 Again in the context of your example (where in fact you 
 deliberately set 3 points to have exceptional errors, thus 
 coincidentally the same as the default value 3 of id.n), you 
 could try different values for id.n and inspect the graphs to 
 see whether a given value of id.n marks some points that do 
 not look exceptional relative to the mass of the other points.
 
 So, the above plot(ll,id.n=5) gave me one point, 10 on the 
 residuals plot, which apparently belonged to the general 
 distribution of residuals.
 
 Hoping this helps,
 Ted.
 
 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861
 Date: 16-Mar-07   Time: 13:43:54
 -- XFMail 

Re: [R] how to assign fixed factor in lm

2007-03-08 Thread Liaw, Andy
Either you did not read docs sufficiently carefully, or the source where
you learn to do this from is questionable.  The lm() function has no
argument called fixed, and the warning should have made that clear to
you.  It was sheer luck on your part that you happen to put Value as
the first variable in Food, in which case lm() will treat it as the
response and the rest as predictors in the absence of a model formula.

You should try:

lm(Value ~ Gender, Food)

lm() itself has no concept of fixed or random effects.  lme() in the
nlme package does, and it has the fixed argument.

Andy

From: [EMAIL PROTECTED]
 
 Hi there,
 
  Value=c(709,679,699,657,594,677,592,538,476,508,505,539)
  Lard=rep(c(Fresh,Rancid),each=6)
  Gender=rep(c(Male,Male,Male,Female,Female,Female),2)
  Food=data.frame(Value,Lard,Gender)
  Food
Value   Lard Gender
 1709  Fresh   Male
 2679  Fresh   Male
 3699  Fresh   Male
 4657  Fresh Female
 5594  Fresh Female
 6677  Fresh Female
 7592 Rancid   Male
 8538 Rancid   Male
 9476 Rancid   Male
 10   508 Rancid Female
 11   505 Rancid Female
 12   539 Rancid Female
  lm(fixed=Value~Gender,data=Food)
 Call:
 lm(data = Food, fixed = Value ~ Gender)
 
 Coefficients:
 (Intercept)   LardRancid   GenderMale  
   651.4   -142.8 35.5  
 
 Warning message:
 extra arguments fixed are just disregarded. in: lm.fit(x, y, 
 offset = offset, singular.ok = singular.ok, ...) 
 
  lm(fixed=Value~Lard+Gender,data=Food)
 
 Call:
 lm(data = Food, fixed = Value ~ Lard + Gender)
 
 Coefficients:
 (Intercept)   LardRancid   GenderMale  
   651.4   -142.8 35.5  
 
 Warning message:
 extra arguments fixed are just disregarded. in: lm.fit(x, y, 
 offset = offset, singular.ok = singular.ok, ...) 
 
 I wanted to consider only one factor. But why 
 lm(fixed=Value~Gender,data=Food) return me two estimates of 
 Gender and Lard. And I found the returning results are the 
 same as lm(fixed=Value~Lard+Gender,data=Food). Why lm cannot 
 do analysis of variance according to assigned formula?
 
 Thank you very much.
 
 Fan
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 0 * NA = NA

2007-03-05 Thread Liaw, Andy
From: Alberto Monteiro
 
 Is there any way to force 0 * NA to be 0 instead of NA?
 
 For example, suppose I have a vector with some valid values, 
 while other values are NA. If I matrix-pre-multiply this by a 
 weight row vector, whose weights that correspond to the NAs 
 are zero, the outcome will still be NA:
 
 x - c(1, NA, 1)
 wt - c(2, 0, 1)
 wt %*% x # NA

I don't think it's prudent to bend arthmetic rules of a system,
especially when there are good reasons for them.  Here's one:

R 0 * Inf
[1] NaN

If you are absolutely sure that the Nas in x cannot be Inf (or -Inf),
you might try to force the result to 0, but the only way I can think of
is to do something like:

R wt %*% ifelse(wt, x, 0)
 [,1]
[1,]3

Andy 

 
 Alberto Monteiro
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to apply the function cut( ) to many columns in a data.frame?

2007-03-01 Thread Liaw, Andy
From: Chuck Cleland
 
 ahimsa campos-arceiz wrote:
  Dear useRs,
  
  In a data.frame (df) I have several columns (x1, x2, x3xn) 
  containing data as a continuous numerical response:
  
  df
   var x1x2 x3
1143   147   137
2  9393   117
316439   101
4123   11897
5 63   125 97
612983   124
712393   136
812380 79
9 89   107   150
  10 7895121
  
  I want to classify the values in the columns x1, x2, etc, 
 into bins of 
  fix margins (0-5, 5-10, ). For one vector I can do it 
 easily with 
  the function cut:
  
  df$x1 - cut(df$x1, br=5*(0:40), labels=5*(1:40))
  df$x1
   [1] 145 95  165 125 65  130 125 125 90  80 40 Levels: 5 10 
 15 20 25 
  30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 ...
  200
  
  However if I try to use a subset of my data.frame:
  
  df[,3:4] - cut(df[,3:4], br=5*(0:40), labels=5*(1:40))
  
  Error in cut.default(df[, 3:4], br = 5 * (0:40), labels = 5 
 * (1:40)) :
  'x' must be numeric
  
  
  How can I make this work with data frames in which I want 
 to apply the 
  function cut( ) to many columns in a data.frame?
 
   You have an answer within your question - use one of the 
 various apply functions.  For example:
 
 lapply(df[,3:4], function(x){cut(x, br=5*(0:40), labels=5*(1:40))})

Or perhaps a bit more simply:

lapply(df[, 3:4], cut, br=5*(0:40), labels=5*(1:40)))

and if a data frame is desired as output, wrap the above in
as.data.frame().

(Just keep in mind that a data frame is like a list.)

Andy

 
 ?lapply
 ?sapply
 ?apply
 
  I guess that I might have to use something like for ( ) 
 (which I'm not 
  familiar with), but maybe you know a straight forward method to use 
  with data.frames.
  
  
  Thanks a lot!
  
  Ahimsa
  
  *
  
  # data
  var - 1:10
  x1 - rnorm(10, mean=100, sd=25)
  x2 - rnorm(10, mean=100, sd=25)
  x3 - rnorm(10, mean=100, sd=25)
  df - data.frame(var,x1,x2,x3)
  df
  
  # classifying the values of the vector df$x1 into bins of width 5
  df$x1 - cut(df$x1, br=5*(0:40), labels=5*(1:40))
  df$x1
  
  # trying it a subset of the data.frame df[,3:4] - cut(df[,3:4], 
  br=5*(0:40), labels=5*(1:40)) df[,3:4]
 
 --
 Chuck Cleland, Ph.D.
 NDRI, Inc.
 71 West 23rd Street, 8th floor
 New York, NY 10010
 tel: (212) 845-4495 (Tu, Th)
 tel: (732) 512-0171 (M, W, F)
 fax: (917) 438-0894
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read in this data format?

2007-03-01 Thread Liaw, Andy
You can't expect general-purpose tools like read.table in R to be able
to deal with highly specialized file format.  Here's what I'd start.  It
doesn't put data in the format you specified exactly, but I doubt you'll
need that.  This might be sufficient for your purpose:

dat - readLines(file(yourdata.dat))
## Get rid of blank lines.
dat - dat[dat != ]
scan.lines - grep(Scan, dat)
## Chop off the header rows.
dat - dat[scan.lines[1]:length(dat)]
scan.lines - scan.lines - scan.lines[1] + 1
lines.per.scan - c(scan.lines[-1], length(dat) + 1) - scan.lines
## Split the data into a list, with each scan taking up one component.
dat - split(dat, rep(seq(along=lines.per.scan), each=lines.per.scan))
## Process the data one scan at a time.
result - lapply(dat, function(x) {
x - strsplit(x, \t)
rtime - x[[2]][2]  # second field of second line
t(matrix(as.numeric(do.call(rbind, c(rtime, x[-(1:2)]))), ncol=2))
})

This is what I get from the data you've shown:

R result
$`1`
  [,1] [,2] [,3] [,4]
[1,] 0.017 399.8112 399.8742 399.9372
[2,] 0.017 184.   0. 152.

$`2`
  [,1] [,2] [,3] [,4]
[1,] 0.021 399.8112 399.8742 399.9372
[2,] 0.021 181.   1. 153.

Note that you probably should avoid using numbers as column names in a
data frame, even if it's possible.

Andy


From: Bart Joosen
 
 Hi,
 
 I recieved an ascii file, containing following information:
 
 $$ Experiment Number:
 $$ Associated Data:
 
 FUNCTION 1
 
 Scan  1
 Retention Time0.017
 
 399.8112  184
 399.8742  0
 399.9372  152
 
 
 Scan  2
 Retention Time0.021
 
 399.8112  181
 399.8742  1
 399.9372  153
 .
 
 
 I would like to import this data in R into a dataframe, where 
 there is a column time, the first numbers as column names, 
 and the second numbers as data in the dataframe:
 
 Time  399.8112399.8742399.9372
 0.017 184 0   152
 0.021 181 1   153
 
 I did take a look at the read.table, read.delim, scan, ... 
 But I 've no idea about how to solve this problem.
 
 Anyone?
 
 
 Thanks
 
 Bart
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Package RGtk2, rattle, libatk-1.0.0.dll Errors

2007-02-28 Thread Liaw, Andy
The way I have that problem resolved is by installing the rggobi package
using the command shown on http://www.ggobi.org/rggobi/, which is

  source(http://www.ggobi.org/download/install.r;)

That will install all the things that Ggobi needs.  Since rattle depends
on rggobi, it's probably a good idea to do this anyway.  After that,
rattle will start just fine.  I have not actually use rattle, though.

Andy 

From: j.joshua thomas
 
 Dear Group,
 
 I have followed the instructions  from the link 
 http://datamining.togaware.com/survivor/Installing_GTK.html
 However i couldn't fix the libatk01.0.0.dll application error
 
 Here, i did uninstall R-Gui-2.4.0 then did the fresh 
 installation and still facing the same problem I am using Windows- XP
 
 *Please find the following*
 
 
 R version 2.4.0 (2006-10-03)
 Copyright (C) 2006 The R Foundation for Statistical Computing 
 ISBN 3-900051-07-0
 
 R is free software and comes with ABSOLUTELY NO WARRANTY.
 You are welcome to redistribute it under certain conditions.
 Type 'license()' or 'licence()' for distribution details.
 
   Natural language support but running in an English locale
 
 R is a collaborative project with many contributors.
 Type 'contributors()' for more information and 'citation()' 
 on how to cite R or R packages in publications.
 
 Type 'demo()' for some demos, 'help()' for on-line help, or 
 'help.start()' for an HTML browser interface to help.
 Type 'q()' to quit R.
 
 [Previously saved workspace restored]
 
  install.packages(RGtk2)
 --- Please select a CRAN mirror for use in this session --- 
 trying URL '
 http://cran.au.r-project.org/bin/windows/contrib/2.4/RGtk2_2.8.7.zip'
 Content type 'application/zip' length 13050736 bytes opened 
 URL downloaded 12744Kb
 
 package 'RGtk2' successfully unpacked and MD5 sums checked
 
 The downloaded packages are in
 C:\Documents and Settings\jjoshua\Local 
 Settings\Temp\RtmpLetwrb\downloaded_packages
 updating HTML package descriptions
  install.packages(rattle)
 trying URL '
 http://cran.au.r-project.org/bin/windows/contrib/2.4/rattle_2.2.0.zip'
 Content type 'application/zip' length 340875 bytes opened URL 
 downloaded 332Kb
 
 package 'rattle' successfully unpacked and MD5 sums checked
 
 The downloaded packages are in
 C:\Documents and Settings\jjoshua\Local 
 Settings\Temp\RtmpLetwrb\downloaded_packages
 updating HTML package descriptions
  library(RGtk2)
 Error in dyn.load(x, as.logical(local), as.logical(now)) :
 unable to load shared library
 'C:/PROGRA~1/R/R-24~1.0/library/RGtk2/libs/RGtk2.dll':
   LoadLibrary failure:  The specified module could not be found.
 
 In addition: Warning message:
 package 'RGtk2' was built under R version 2.4.1
 Error: package/namespace load failed for 'RGtk2'
 
 
 
 --
 Lecturer J. Joshua Thomas
 KDU College Penang Campus
 Research Student,
 University Sains Malaysia
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple conditional without if [Broadcast]

2007-02-27 Thread Liaw, Andy
From: bunny, lautloscrew.com
 
 Dear all,
 
 i am stuck with a syntax problem.
 
 i have a matrix which has about 500 rows and 6 columns.
 now i want to kick some data out.
 i want create a new matrix which is basically the old one 
 except for all entries which have a 4 in the 5 column AND a 1 
 in the 6th column.
 
 i tried the following but couldn´t get a new matrix, just some wierd
 errors:
 
 newmatrix=oldmatrix[,2][oldmatrix[,5]==4]oldmatrix[,2][oldmatrix[,6]
 ==1]
 
 all i get is:
 numeric(0)

That's not a `weird error', but a numeric vector of length 0.
 
 does anybody have an idea how to fix this one ?

Try:

newmatrix = oldmatrix[oldmatrix[, 5]==4  oldmatrix[, 6] == 1, 2, drop=FALSE]

If you just want a subset of column 2 as a vector, you can leave off the 
drop=FALSE part.

Reading An Introduction to R should have save you some trouble in the first 
place.

Andy
 
 thx in advance
 
 matthias
   [[alternative HTML version deleted]]
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to use apply with two variables

2007-02-23 Thread Liaw, Andy
Yes.  Just try it and see.

BTW, your usage of return() is not recommended anymore.  This is
probably easier:

myfun-function(x) c(mean=mean(x), sd=sd(x))
out - apply(mat, 1, myfun)
## or...
out2 - cbind(mean=rowMeans(mat), sd=sd(t(mat))) 

Andy


From: Serguei Kaniovski
 
 Hi,
 
 this is a made-up example. Function myfun returns two 
 arguments. Can apply be used so that myfun is called only once?
 
 Thanks
 Serguei
 
 mat-matrix(runif(50),nrow=10,ncol=5)
 
 myfun-function(x) {
  mymean-mean(x)
  mysd-sd(x)
  return(mymean,mysd)
 }
 
 out1-t(apply(mat,1,function(x) myfun(x)$mymean))
 out2-t(apply(mat,1,function(x) myfun(x)$mysd))
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] JGR launcher for linux

2007-02-22 Thread Liaw, Andy
Isn't it right in front of you?  I get:

 JGR()
Starting JGR ...
(You can use /usr/local/lib64/R/library/JGR/cont/run to start JGR directly)
 ^^^

Andy

From: Ronaldo Reis Junior
 
 Hi,
 
 anybody have a JGR launcher for linux? Maybe a script that 
 launch JGR directly without open R then library(JGR) and JGR().
 
 Thanks
 Ronaldo
 --
 Deflector shields just came on, Captain.
 --
  Prof. Ronaldo Reis Júnior
 |  .''`. UNIMONTES/Depto. Biologia Geral/Lab. Ecologia Evolutiva
 | : :'  : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia `. 
 | `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil
 |   `- Fone: (38) 3229-8190 | [EMAIL PROTECTED] | 
 | [EMAIL PROTECTED]
 | ICQ#: 5692561 | LinuxUser#: 205366
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory management uestion [Broadcast]

2007-02-20 Thread Liaw, Andy
I don't see why making copies of the columns you need inside the loop is
better memory management.  If the data are in a matrix, accessing
elements is quite fast.  If you're worrying about speed of that, do what
Charles suggest: work with the transpose so that you are accessing
elements in the same column in each iteration of the loop.

Andy 

From: Federico Calboli
 
 Charles C. Berry wrote:
 
  Whoa! You are accessing one ROW at a time.
  
  Either way this will tangle up your cache if you have many rows and 
  columns in your orignal data.
  
  You might do better to do
  
  Y - t( X ) ### use '-' !
  
  for (i in whatever ){
  do something using Y[ , i ]
  }
 
 My question is NOT how to write the fastest code, it is 
 whether dummy variables (for lack of better words) make the 
 memory management better, i.e. faster, or not.
 
 Best,
 
 Fede
 
 --
 Federico C. F. Calboli
 Department of Epidemiology and Public Health Imperial 
 College, St Mary's Campus Norfolk Place, London W2 1PG
 
 Tel  +44 (0)20 7594 1602 Fax (+44) 020 7594 3193
 
 f.calboli [.a.t] imperial.ac.uk
 f.calboli [.a.t] gmail.com
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rpart tree node label [Broadcast]

2007-02-14 Thread Liaw, Andy
Try the following to see:

library(rpart)
iris.rp(Sepal.Length ~ Species, iris)
plot(iris.rp)
text(iris.rp)

Two possible solutions:

1. Use text(..., pretty=0).  See ?text.rpart.
2. Use post(..., filename=).

Andy 

From: Wensui Liu
 
 not sure how you want to label it.
 could you be more specific?
 thanks.
 
 On 2/14/07, Aimin Yan [EMAIL PROTECTED] wrote:
  I generate a tree use rpart.
  In the node of tree, split is based on the some factor.
  I want to label these node based on the levels of this factor.
 
  Does anyone know how to do this?
 
  Thanks,
 
  Aimin
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 --
 WenSui Liu
 A lousy statistician who happens to know a little programming
 (http://spaces.msn.com/statcompute/blog)
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [BioC] Outlook does threading [Broadcast]

2007-01-31 Thread Liaw, Andy
This is really off-topic for both BioC and R-help, so I'll 
keep it short. 


From: Kimpel, Mark William
 
 See below for Bert Gunter's off list reply to me (which I do 
 appreciate). I'm putting it back on the list because it seems 
 there is still confusion regarding the difference between 
 threading and sorting by subject. I thought the example I 
 will give below will serve as instructional for other Outlook 
 users who may be similarly confused as I was (am?). 
 
 Per Bert's instructions, I just set up my inbox to sort by 
 subject. I sent one email to myself with the subject test1 
 and then replied to it without changing the subject. The 
 reply correctly went to test1 in the inbox sorter. I then 
 changed the subject heading in the test1 reply to test2 and 
 sent it to myself. This time Outlook re-categorized it and 
 put it in a separate compartment in the view called test2.
 
 If Outlook can do threading the way the R mail server does, I 
 don't think this is the way to do it.

AFAIK there's no proper way to get the correct threading in 
Outlook.  What I do is group by conversation topic, but that
doesn't solve the problem.  This is only problem on your
(and all Outlook users'?) end, though.  The bigger problem
that affects the lists is that some versions of MS Exchange 
Server do not include the In-reply-to header field that
many mailing lists rely on for proper threading.  As a result,
when I reply to other people's post, it may show up in Outlook
as having been threaded properly (because the subject is fine),
but it throws everything else that does proper threading off.
 
 Unless someone has an idea of how to correctly set up Outlook 
 to do threading in the manner that the R mail server does,

Maybe some VBA coding can be done to get it right, but short
of that, I very much doubt it.

 I 
 think the message for us Outlook users is to just create, 
 from scratch, a new message when initiating a new subject.

That message ought to be clear for everyone.  You should
never reply to a message when you really mean to start
a new topic, regardless what you are using.

Andy
 
 Thanks for all your help. 
 
 Mark
 
 -Original Message-
 From: Bert Gunter [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, January 31, 2007 7:03 PM
 To: Kimpel, Mark William
 Subject: Outlook does threading
 
  Mark:
 
 No need to bother the R list with this. Outlook does 
 threading. Just sort on Subject in the viewer.
 
 Bert Gunter
 Genentech Nonclinical Statistics
 South San Francisco, CA 94404
 650-467-7374
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Kimpel, Mark William
 Sent: Wednesday, January 31, 2007 3:36 PM
 To: Peter Dalgaard
 Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED]
 Subject: Re: [R] possible spam alert
 
 Peter,
 
 Thanks you for your explanation, I had taken Mr. Connolly's 
 message to me to imply that I was not changing the subject 
 line. I use MS Outlook
 2007 and, unless I am just not seeing it, Outlook does not 
 normally display the in reply to header, I was under the 
 mistaken impression that that was what the Subject line was 
 for. See, for example, the header to your message to me 
 below. Outlook will, however, sort messages by Subject, and 
 that is what I thought was meant by threading.
 
 Well, I learned something today and apologize for any 
 inconvenience my posts may have caused.
 
 BTW, I use Outlook because it is supported by my university 
 server and will synch my appointments and contacts with my 
 PDA, which runs Windows CE. If anyone has a suggestion for me 
 of a better email program that will provide proper threading 
 AND work with a MS email server and synch with Windows CE, 
 I'd love to hear it.
 
 Thanks again,
 
 Mark
 
 Mark W. Kimpel MD 
 
  
 
 (317) 490-5129 Work,  Mobile
 
  
 
 (317) 663-0513 Home (no voice mail please)
 
 1-(317)-536-2730 FAX
 
 
 -Original Message-
 From: Peter Dalgaard [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, January 31, 2007 6:25 PM
 To: Kimpel, Mark William
 Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
 Subject: Re: [R] possible spam alert
 
 Kimpel, Mark William wrote:
  The last two times I have originated message threads on R or 
  Bioconductor I have received the message included below 
 from someone 
  named Patrick Connolly. Both times I was the originator of 
 the message 
  thread and used what I thought was a unique subject line that
 explained
  as best I could what my question was. Patrick seems to be implying
 that
  I am abusing the R and BioC help newsgroups in this fashion. 
 
  When I emailed him to give me a specific example, he did not reply.
 The
  most recent thread that he seems concerned about was to the 
 R list and 
  was entitled regexpr and parsing question . I believe the 
 previous 
  post of mine that he had problems with was to the BioC list but I
 can't
  remember its subject.
 
  Is this spam?

 No. Breach of netiquette, yes.
 
 The 

Re: [R] Memory problem on a linux cluster using a large data set [Broadcast]

2006-12-18 Thread Liaw, Andy
In addition to my off-list reply to Iris (pointing her to an old post of
mine that detailed the memory requirement of RF in R), she might
consider the following:

- Use larger nodesize
- Use sampsize to control the size of bootstrap samples

Both of these have the effect of reducing sizes of trees grown.  For a
data set that large, it may not matter to grow smaller trees.

Still, with data of that size, I'd say 64-bit is the better solution.

Cheers,
Andy

From: Martin Morgan
 
 Iris --
 
 I hope the following helps; I think you have too much data 
 for a 32-bit machine.
 
 Martin
 
 Iris Kolder [EMAIL PROTECTED] writes:
 
  Hello,
   
  I have a large data set 320.000 rows and 1000 columns. All the data 
  has the values 0,1,2.
 
 It seems like a single copy of this data set will be at least 
 a couple of gigabytes; I think you'll have access to only 4 
 GB on a 32-bit machine (see section 8 of the R Installation 
 and Administration guide), and R will probably end up, even 
 in the best of situations, making at least a couple of copies 
 of your data. Probably you'll need a 64-bit machine, or 
 figure out algorithms that work on chunks of data.
 
  on a linux cluster with R version R 2.1.0.  which operates on a 32
 
 This is quite old, and in general it seems like R has become 
 more sensitive to big-data issues and tracking down 
 unnecessary memory copying.
 
  cannot allocate vector size 1240 kb. I've searched through
 
 use traceback() or options(error=recover) to figure out where 
 this is actually occurring.
 
  SNP - read.table(file.txt, header=FALSE, sep=)# 
 read in data file
 
 This makes a data.frame, and data frames have several aspects 
 (e.g., automatic creation of row names on sub-setting) that 
 can be problematic in terms of memory use. Probably better to 
 use a matrix, for which:
 
  'read.table' is not the right tool for reading large matrices,
  especially those with many columns: it is designed to read _data
  frames_ which may have columns of very different classes. Use
  'scan' instead.
 
 (from the help page for read.table). I'm not sure of the 
 details of the algorithms you'll invoke, but it might be a 
 false economy to try to get scan to read in 'small' versions 
 (e.g., integer, rather than
 numeric) of the data -- the algorithms might insist on 
 numeric data, and then make a copy during coercion from your 
 small version to numeric.
 
  SNP$total.NAs = rowSums(is.na(SN # calculate the 
 number of NA per row and adds a colum with total Na's
 
 This adds a column to the data.frame or matrix, probably 
 causing at least one copy of the entire data. Create a 
 separate vector instead, even though this unties the 
 coordination between columns that a data frame provides.
 
  SNP  = t(as.matrix(SNP))  # 
 transpose rows and columns
 
 This will also probably trigger a copy; 
 
  snp.na-SNP
 
 R might be clever enough to figure out that this simple 
 assignment does not trigger a copy. But it probably means 
 that any subsequent modification of snp.na or SNP *will* 
 trigger a copy, so avoid the assignment if possible.
 
  snp.roughfix-na.roughfix(snp.na)   
   
  fSNP-factor(snp.roughfix[, 1])# Asigns 
 factor to case control status
   
  snp.narf- randomForest(snp.roughfix[,-1], fSNP, 
  na.action=na.roughfix, ntree=500, mtry=10, importance=TRUE, 
  keep.forest=FALSE, do.trace=100)
 
 Now you're entirely in the hands of the randomForest. If 
 memory problems occur here, perhaps you'll have gained enough 
 experience to point the package maintainer to the problem and 
 suggest a possible solution.
 
  set it should be able to cope with that amount. Perhaps someone has 
  tried this before in R or is Fortram a better choice? I added my R
 
 If you mean a pure Fortran solution, including coding the 
 random forest algorithm, then of course you have complete 
 control over memory management. You'd still likely be limited 
 to addressing 4 GB of memory. 
 
 
  I wrote a script to remove all the rows with more than 46 missing 
  values. This works perfect on a smaller dataset. But the problem 
  arises when I try to run it on the larger data set I get an error 
  cannot allocate vector size 1240 kb. I've searched 
 through previous 
  posts and found out that it might be because i'm running it 
 on a linux 
  cluster with R version R 2.1.0.  which operates on a 32 bit 
 processor. 
  But I could not find a solution for this problem. The cluster is a 
  really fast one and should be able to cope with these large 
 amounts of 
  data the systems configuration are Speed: 3.4 GHz, memory 
 4GByte. Is 
  there a way to change the settings or processor under R? I 
 want to run 
  the function Random Forest on my large data set it should 
 be able to 
  cope with that amount. Perhaps someone has tried this 
 before in R or 
  is Fortram a better choice? I added my R script down 

Re: [R] plot.svm

2006-12-08 Thread Liaw, Andy
Try

debug(e1071:::plot.svm)

and then re-run your plot command, stepping through one line at a time
and see where it fails.

Andy 

From: Aimin Yan
 
 where is plot.svm method?
 I just find plot(svm, data, formula) method
 
 Aimin
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] any way to make the code more efficient ?

2006-12-08 Thread Liaw, Andy
I don't know about efficiency, but at least for readability, you may
want to do the following:

1. Indent your code.
2. Create a list of appropriate length, and populate the list with
objects you're creating in the loop.
3. After the loop, use do.call(rbind, list).

HTH,
Andy 

From: Leeds, Mark (IED)
 
 ravi : I appreciate your help but could you be a little more 
 specific about what you mean ? I can just stack aggfxdata 
 below the current full one ( the rbind works out the 
 ordrering by date because it's a zoo object ) so it's not a 
 question of where to put the new one. It's a question of how 
 to avoid rbind ? I apologize because I don't think I 
 understand what you are saying. Or maybe it's not possible to 
 avoid rbind ? Thanks.
 
 
 -Original Message-
 From: Ravi Varadhan [mailto:[EMAIL PROTECTED]
 Sent: Friday, December 08, 2006 5:21 PM
 To: Leeds, Mark (IED); r-help@stat.math.ethz.ch
 Subject: RE: [R] any way to make the code more efficient ?
 
 
 Using rbind almost always slows things down significantly.  
 You should
 define the objects aggfxdata and fullaggfxdata before the loop and
 then assign appropriate values to the corresponding rows 
 and/or columns.
 
 
 Ravi.
 
 --
 --
 
 ---
 
 Ravi Varadhan, Ph.D.
 
 Assistant Professor, The Center on Aging and Health
 
 Division of Geriatric Medicine and Gerontology 
 
 Johns Hopkins University
 
 Ph: (410) 502-2619
 
 Fax: (410) 614-9625
 
 Email: [EMAIL PROTECTED]
 
 Webpage:
 http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
 
  
 
 --
 --
 
 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Leeds, 
 Mark (IED)
 Sent: Friday, December 08, 2006 4:17 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] any way to make the code more efficient ?
 
 The code bekow works so this is why I didn't include the data to
 reproduce it. The  loops about 500 times and each time, a zoo object
 with 1400 rows and 4 columns gets created. ( the rows 
 represent minutes
 so each file is one day worth of data). Inside the loop, I 
 keep rbinding
 the newly created zoo object to the current zoo object so that it gets
 bigger and bigger over time.
 
 Eventually, the new zoo object, fullaggfxdata,  containing 
 all the days
 of data is created.
 
 I was just wondering if there is a more efficient way of doing this. I
 do know the number of times the loop will be done at the beginning so
 maybe creating the a matrix or data frame at the beginning and putting
 the daily ones in something like that would Make it be 
 faster. But, the
 proboem with this is I eventually do need a zoo object.  I ask this
 question because at around the 250 mark of the loop, things start to
 slow down significiantly and I think I remember reading somewhere that
 doing an rbind of something to itself is not a good idea.  Thanks. 
 
 #=
 ==
 ===
 
 start-1
 
 for (filecounter in (1:length(datafilenames))) { 
 
 print(paste(File Counter = , filecounter)) datafile=
 paste(datadir,/,datafilenames[filecounter],sep=)
 aggfxdata-clnaggcompcurrencyfile(fxfile=datafile,aggminutes=a
 ggminutes,
 fillholes=1)
 logbidask-log(aggfxdata[,bidask])
 aggfxdata-cbind(aggfxdata,logbidask)
 
 if ( start == 1 ) {
 fullaggfxdata-aggfxdata
 start-0
 } else {
 fullaggfxdata-rbind(fullaggfxdata,aggfxdata)
 }
 
 
 }
 
 #=
 ==
 ==
 
 
 This is not an offer (or solicitation of an offer) to
 buy/se...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 This is not an offer (or solicitation of an offer) to 
 buy/se...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] from table to dataframe

2006-12-05 Thread Liaw, Andy
Like this?

R data.frame(unclass(df2))
   a b c d
s1 8 4 4 4
s2 8 4 4 4
s3 8 4 4 4

Andy
 

From: Milton Cezar Ribeiro
 
 Hi there, 

   I have a two-entrance dataframe, and I would like generate 
 a new dataframe with its frequency. I tryed this

   site-rep(c(s1,s2,s3),20)
 species-rep(c(a,b,a,c,d),12)
   df-data.frame(cbind(site,species))
   df2-table(df)
 
   But when I convert df2 to data.frame I miss the square 
 format. I would like have my data.frame like this:

   site a b c d
   s1 8 4 4 4
   s2 8 4 4 4
   s3 8 4 4 4

   Any help?

   Miltinho
 
   
 -
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count cases by indicator

2006-12-04 Thread Liaw, Andy
I might be missing something, but the data you showed don't seem to
match your expectation.  Firstly, 1 in binary is 511 in decimal,
so your coordinates are off by 1.  Secondly, for the data you've
shown, the matrix equivalent look like:

m - matrix(df$x, ncol=9, byrow=TRUE)
rownames(m) - levels(df$cases)
print(m)

 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
093/0188001011111
093/0206000000000
093/0216011111011
093/0305011111111
093/0325000000000
093/0449000000000
093/0473001111111
093/0499001111111

The counts of unique occurances are:

table(do.call(paste, c(as.data.frame(m), sep=)

0 00101 00111 01011 0 
3 1 2 1 1 

which do not agree with yours.

If I understood what you wanted, I would do:

R table(rowSums(matrix(2^(0:8) * df$x, ncol=9, byrow=TRUE)))

  0 446 500 508 510 
  3   1   1   2   1 

Andy


From: Serguei Kaniovski
 
 Hi,
 
 In the data below, case represents cases, x binary 
 states. Each case has exactly 9 x, ie is a binary vector 
 of length 9.
 
 There are 2^9=512 possible combinations of binary states in a 
 given case, ie 512 possible vectors. I generate these in 
 the order of the decimals the vectors represent, as:
 
 cmat-as.matrix(expand.grid(rep(list(0:1),9)))
 cmat-cmat[nrow(cmat):1,ncol(cmat):1]
 
 cmat contains the binary vectors as rows.
 
 QUESTION: I would like to know how often each of the 512 
 vectors occurs in case.
 
 With these data, the output should be a vector with 2^9=512 
 coordinates, having 2,2,1,3, as, respectively, the coordinate 
 number 129, 193, 449, 512, and zeros in all other coordinates.
 
 Thank you for your help,
 Serguei
 
 df-read.delim(clipboard,sep=;)
 
 DATA:
 case;x
 093/0188;0
 093/0188;0
 093/0188;1
 093/0188;0
 093/0188;1
 093/0188;1
 093/0188;1
 093/0188;1
 093/0188;1
 093/0206;0
 093/0206;0
 093/0206;0
 093/0206;0
 093/0206;0
 093/0206;0
 093/0206;0
 093/0206;0
 093/0206;0
 093/0216;0
 093/0216;1
 093/0216;1
 093/0216;1
 093/0216;1
 093/0216;1
 093/0216;0
 093/0216;1
 093/0216;1
 093/0305;0
 093/0305;1
 093/0305;1
 093/0305;1
 093/0305;1
 093/0305;1
 093/0305;1
 093/0305;1
 093/0305;1
 093/0325;0
 093/0325;0
 093/0325;0
 093/0325;0
 093/0325;0
 093/0325;0
 093/0325;0
 093/0325;0
 093/0325;0
 093/0449;0
 093/0449;0
 093/0449;0
 093/0449;0
 093/0449;0
 093/0449;0
 093/0449;0
 093/0449;0
 093/0449;0
 093/0473;0
 093/0473;0
 093/0473;1
 093/0473;1
 093/0473;1
 093/0473;1
 093/0473;1
 093/0473;1
 093/0473;1
 093/0499;0
 093/0499;0
 093/0499;1
 093/0499;1
 093/0499;1
 093/0499;1
 093/0499;1
 093/0499;1
 093/0499;1
 --
 ___
 
 Austrian Institute of Economic Research (WIFO)
 
 Name: Serguei Kaniovski   P.O.Box 91
 Tel.: +43-1-7982601-231   Arsenal Objekt 20
 Fax:  +43-1-7989386   1103 Vienna, Austria
 Mail: [EMAIL PROTECTED]
 
 http://www.wifo.ac.at/Serguei.Kaniovski
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] automatic cleaning of workspace

2006-11-28 Thread Liaw, Andy
You can avoid loading .Rdata at start-up without deleting the .Rdata
file by adding the --no-restore option to the R command.  I have that,
and additionally, --no-save, in my shortcut for the Rgui.exe command.  I
use explicit save() and load() in my scripts to save objects that are
expensive to compute.

Andy

From: Leeds, Mark (IED)
 
 I'm having that problem where I am sometimes using an object 
 that's from a previous workspace when I don't want to be 
 doing that. I was thinking of putting rm(objects=ls()) in my 
 first.R function But, the problem with doing this, is that it 
 doesn't prompt you with are you sure and there could be 
 very rare cases where I don't want to delete the workspace ? 
 Is there a way to make the cleaning of the workspace 
 automatic but still prompt you ? I guess I can always just 
 try to remember to manually do rm(objects=ls())when I start 
 up in whatever workspace I am in but I don't think I can 
 trust my memory. Thanks.
 
 
 This is not an offer (or solicitation of an offer) to 
 buy/se...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random forest regression

2006-11-15 Thread Liaw, Andy
One way is to graft the stratified sampling code from the classification
part onto the regression part.  I will get to it eventually, but just
not now.

Andy 

From: Naiara Pinto
 
 Dear all,
 
  I am doing a regression in ramdomForest, using the option 
 sampsize reduce the number of records used to produce the 
 randomForest object.
 The manual says For classification, if sampsize is a vector 
 of the length the number of strata, then sampling is 
 stratified by strata, and the elements of sampsize indicate 
 the numbers to be drawn from the strata.  I need my sampling 
 to be done with factors, but I am doing a regression. Does 
 anyone know a way to do that?
 
 Thanks,
 
 Naiara.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CPU or memory

2006-11-08 Thread Liaw, Andy
My understanding is that it doesn't have much to do with 32- vs. 64-bit,
but what the instruction sets of the CPUs.  If I'm not mistaken, at the
same clock speed, a P4 would run slower than PIII simply because P4 does
less per clock-cycle.  Also, I believe for the same architecture, single
core chips are available at higher clock speeds than their multi-core
counterparts.  That's why we recently went for a box with four
single-core Opterons instead of two dual-core ones.

64-bit PCs should be really affordable:  I've seen HP laptops based on
the Turion chip selling below $500US.

Andy 

From: John C Frain
 
 I would like to thank all who replied to my question about 
 the efficiency of various cpu's in R.
 
 Following the advice of Bogdan Romocea I have put a sample 
 simulation and the latest version of R on a USB drive and 
 will go to a few suppliers to try it out.  I will report back 
 if I find anything of interest.
 
 With regard to 64-bit and 32-bit I thought that the 64-bit 
 chip might require less clock cycles for a specific machine 
 instruction than a 32-bit.
 This was one of the advantages of moving from 8 to 16 or from 
 16 to 32 bit chips.  Thus a slower, in terms of clock speed, 
 64-bit chip might run faster than a somewhat similar 32-bit 
 chip.  I fully realize that the full advantage of a 64-bit 
 chip is available only with a 64-bit operating system and I 
 am preparing to switch some work to Linux in case I acquire a 
 64-bit PC.  If I do I will time the simulations on that also.
 
 I already do some coarse-grained parallelism as described 
 by *Brian Ripley
 * but on two separate PC's.  This is not ideal but allows the 
 processing time to be halved without the overheads.
 
 FORTRAN 2 was my first programming language and I agree that 
 I should try to use C or FORTRAN to speed up things.  Finally 
 Rprof could be a great help.
 There are lots of utilities in the utils package with which I 
 was not familiar.
 
 Again Many Thanks to all who made various suggestions.
 
 
bogdan romocea[EMAIL PROTECTED] to *r-help*, me
  More options   07-Nov (1 day ago)   Does any one know of 
 comparisons of
 the Pentium 9x0, Pentium(r)
  Extreme/Core 2 Duo, AMD(r) Athlon(r) 64 , AMD(r) Athlon(r) 
 64 FX/Dual 
  Core AM2 and similar chips when used for this kind of work.
 
 
 
 On 08/11/06, Prof Brian Ripley [EMAIL PROTECTED] wrote:
 
  On Wed, 8 Nov 2006, Christos Hatzis wrote:
 
   Prof. Ripley,
  
   Do you mind providing some pointers on how 
 coarse-grained parallelism
   could be implemented on a Windows environment?  Would it be as 
   simple as running two R-console sessions and then (manually) 
   combining the results
  of
   these simulations.  Or it would be better to run them as batch
  processes.
 
  That is what I would do in any environment (I don't do such things 
  under Windows since all my fast machines run Linux/Unix).
 
  Suppose you want to do 1 simulations.  Set up two batch scripts 
  that each run 5000, and save() the results as a list or 
 matrix under 
  different names, and set a different seed at the top.  Then 
 run each 
  via R CMD BATCH simultaneously.  When both have finished, use an 
  interactive session to load() both sets of results and merge them.
 
   RSiteSearch('coarse grained') did not produce any hits so 
 this topic
  might
   have not been discussed on this list.
  
   I am not really familiar with running R in any mode other than the
  default
   (R-console in Windows) so I might be missing something really 
   obvious. I
  am
   interested in running Monte-Carlo cross-validation in 
 some sort of a 
   parallel mode on a dual core (Pentium D) Windows XP machine.
  
   Thank you.
   -Christos
  
   Christos Hatzis, Ph.D.
   Nuvera Biosciences, Inc.
   400 West Cummings Park
   Suite 5350
   Woburn, MA 01801
   Tel: 781-938-3830
   www.nuverabio.com
  
  
  
   -Original Message-
   From: [EMAIL PROTECTED] 
   [mailto:[EMAIL PROTECTED] On Behalf Of Prof Brian 
   Ripley
   Sent: Wednesday, November 08, 2006 5:29 AM
   To: Stefan Grosse
   Cc: r-help@stat.math.ethz.ch; Taka Matzmoto
   Subject: Re: [R] CPU or memory
  
   On Wed, 8 Nov 2006, Stefan Grosse wrote:
  
   64bit does not make anything faster. It is only of use 
 if you want 
   to use more then 4 GB of RAM of if you need a higher 
 precision of 
   your variables
  
   The dual core question: dual core is faster if programs 
 are able to 
   use that. What is sure that R cannot make (until now) use of the 
   two cores if you are stuck on Windows. It works excellent if you 
   use Linux. So if you want dual core you should work with 
 linux (and 
   then its faster of course).
  
   Not necessarily.  We have seen several examples in which using a 
   multithreaded BLAS (the only easy way to make use of 
 multiple CPUs 
   under Linux for a single R process) makes things many 
 times slower.  
   For tasks that are do not make heavy use of linear algebra, the 
   advantage 

[R] graphics ignore tabs in text

2006-10-31 Thread Liaw, Andy
Dear R-help,

I seem to recall that I can use \t to get tab in a string on a
graphics device, but it doesn't seem to work.  Try:

lab - a\tb\tc
cat(lab, \n)   # works in the console output
plot(1:5, main=lab)  # no tabs in the title
text(3, 3, lab)  # no tabs in the text

I get the same result both in the windows() and pdf() devices.  Any
ideas?

This is R-patched Windows binary just downloaded from CRAN.
R version 2.4.0 Patched (2006-10-29 r39744)

Best,
Andy

Andy Liaw, PhD
Biometrics ResearchPO Box 2000 RY33-300
Merck Research LabsRahway, NJ 07065
andy_liaw(a)merck.com  732-594-0820



--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] one problem about how to hold graphic with R

2006-10-31 Thread Liaw, Andy
I'm not familiar with Matlab, but from what I know, hold on is used to
overlay more stuff on the existing plot.  In R such things are
accomplished a bit differently:  One put up a plot, then use things like
lines(), points(), abline(), etc. to add to the existing plot.   The
closest thing to hold on in Matlab, I think, is par(new=TRUE).

Andy

From: Gavin Simpson
 
 On Tue, 2006-10-31 at 21:36 +0800, yang baohua wrote:
  Sorry to disturb you, but can you help me to solve one 
 little problem?
  I want to draw a graphic after another with R but I cannot find the 
  first one after that.
  Do you know the command to hold the graphic with R?
  I remember with Matlab you may use hold on.
  Thanks.
  
  
 
 You don't say which OS. On MS Windows one can turn on a 
 history of plots to the graphics device and replay your plots 
 - it is in the menu bar for example.
 
 In all OSes, you can start up a new device to take the plot - 
 which is what Matlab does IIRC, so you have two or more plot 
 windows on screen at any one time. This is done like this:
 
 plot(1:10)
 x11()
 plot(1:20)
 x11()
 plot(rnorm(100))
 
 see ?Devices
 
 You can set a device to be active, i.e. switch around between 
 plotting windows using dev.set(), e.g.:
 
  dev.cur() # example from above leaves device 4 active
 X11
   4
  dev.set(3) # switch to device
  dev.cur() # check
 X11
   3
  plot(sort(rnorm(100))) # plot something new on this device
 
 Is this what you were looking for?
 
 HTH
 
 G
 --
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  Gavin Simpson [t] +44 (0)20 7679 0522
  ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
  Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regarding large csv file import [Broadcast]

2006-10-27 Thread Liaw, Andy
TFM (R Data Import/Export manual in this case) can be a better place
to look than the archive. Try specifying colClasses in read.csv()
might help.

Andy

From: [EMAIL PROTECTED]
 
 hi All, 
 
 i have a .csv of size 272 MB and a RAM of 512MB and working 
 on windows XP. 
 I am not able to import the csv file.
 R hangs means it stops responding even SciViews hangs.
 i am using read.csv(FILENAME,sep=,,header=TRUE). Is there 
 any way to import it.
 i have tried archives already but i was not able to sense much.
 
 thanks in advance
 
Sayonara With Smile  With Warm Regards :-)
 
   G a u r a v   Y a d a v
   Assistant Manager,
   Economic Research  Surveillance Department,
   Clearing Corporation Of India Limited.
 
   Address: 5th, 6th, 7th Floor, Trade Wing 'C',  Kamala City, 
 S.B. Marg, Mumbai - 400 013
   Telephone(Office): - +91 022 6663 9398 ,  Mobile(Personal) 
 (0)9821286118
   Email(Office) :- [EMAIL PROTECTED] ,  Email(Personal) 
 :- [EMAIL PROTECTED]
 
 
 ==
 ==
 DISCLAIMER AND CONFIDENTIALITY CAUTION:\ \ This message and 
 ...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to improve the efficiency of the following lapply codes [Broadcast]

2006-10-26 Thread Liaw, Andy
Make good use of Rprof():  It has helped me a great deal in pinpointing
bottlenecks where I would not have suspected.

Cheers,
Andy 

From: Weiwei Shi
 object.size(intersect.matrix)
 41314204
 
 but my machine has 4 G memory, so it should be ok since after 
 12 hours, it finishes 16k out of 60k but still slow non-linearly.
 
 I am thinking to chop 60k into multiple 5k data.frames to run 
 the program. but just wondering is there a way around it?
 
  version
_
 platform   i686-pc-linux-gnu
 arch   i686
 os linux-gnu
 system i686, linux-gnu
 status
 major  2
 minor  3.1
 year   2006
 month  06
 day01
 svn rev38247
 language   R
 version.string Version 2.3.1 (2006-06-01)
 
 [EMAIL PROTECTED] ox]$ more /proc/meminfo
 total:used:free:  shared: buffers:  cached:
 Mem:  4189724672 3035549696 11541749760 282836992 2057129984
 Swap: 4293586944 645042176 3648544768
 
 [EMAIL PROTECTED] ox]$ more /proc/cpuinfo
 processor   : 0
 vendor_id   : GenuineIntel
 cpu family  : 15
 model   : 4
 model name  : Intel(R) Xeon(TM) CPU 3.60GHz
 stepping: 3
 cpu MHz : 3591.419
 cache size  : 2048 KB
 
 
 
 thanks.
 
 On 10/25/06, Weiwei Shi [EMAIL PROTECTED] wrote:
  Hi,
  I have a series of lda analysis using the following lapply function:
 
  n - dim(intersect.matrix)[1]
  net1.lda - lapply(1:(n), function(k) i.lda(data.list, 
  intersect.matrix, i=k, w))
 
  i.lda is function to do the real lda analysis.
 
  intersect.matrix is a nx1026 matrix, n can be a really huge number 
  like 60k. The target is perform a random search. Building a n=120k 
  matrix is impossible for my machine. When n=5k, the task 
 can be done 
  in 30 min while n=60k, it is estimated to take 5 days. So I am 
  wondering where my coding problem is, which causes this to be a 
  nonlinearity.
 
  If more info is needed, I will provide.
 
  thanks
 
  --
  Weiwei Shi, Ph.D
  Research Scientist
  GeneGO, Inc.
 
  Did you always know?
  No, I did not. But I believed...
  ---Matrix III
 
 
 
 --
 Weiwei Shi, Ph.D
 Research Scientist
 GeneGO, Inc.
 
 Did you always know?
 No, I did not. But I believed...
 ---Matrix III
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] binom.test [Broadcast]

2006-10-21 Thread Liaw, Andy
To quote one of the previous answers you've got: The formula you're
using is the TV.  The one binom.test() uses is the ballpark.  Take your
pick.

Andy 

From: Ethan Johnsons
 
 Thank you for the info.  It helps.
 
 After all, it would be:
 
  0.1304348-1.96*(sqrt((0.1304348*(1-0.1304348))/46))
 [1] 0.03310968
  0.1304348+1.96*(sqrt((0.1304348*(1-0.1304348))/46))
 [1] 0.2277599
 
 Does R have a function for the calculation above?
 
 ej
 
 
 On 10/20/06, Francisco J. Zagmutt [EMAIL PROTECTED] wrote:
  Ethan,
 
  You need to explain why you think this is not the right 
 function to 
  use. R is doing exactly what you are asking it to do.  Now 
 is up to 
  you to choose the methodology you feel is correct.
  For a good discussion on your particular issue I recommend you the 
  following
  reference:
 
  A. Agresti and B. A. Coull, Approximate is better than exact for 
  interval estimation of binomial proportions, The American 
 Statistician, vol. 52, no.
  2, pp. 119-126, 1998.
 
  Once you figure out the right function to use see if the 
 function is
  available in R.   If not readily available, and if after 
 searching through
  R's documentation and the forum archives you still can't 
 find a way to 
  perform the calculation, then is time to get back to this forum.
 
  Regards,
 
  Francisco
 
 
  Dr. Francisco J. Zagmutt
  College of Veterinary Medicine and Biomedical Sciences 
 Colorado State 
  University
 
 
 
 
  From: Ethan Johnsons [EMAIL PROTECTED]
  To: r-help@stat.math.ethz.ch
  Subject: [R] binom.test
  Date: Fri, 20 Oct 2006 17:18:02 -0400
  
  A quick question, please.
  
  46 e coli lab samples are tested,  6 of them returned positive.
  
  So, the best point estimate for p is  6/46 = 0.1304348.
  
  For a 95% CI for p,  I thought binom.test would give me the correct
  result, but it seems it is not the right function to use.   What is
  the R function for this?
  
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Box M test [Broadcast]

2006-10-21 Thread Liaw, Andy
See
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/0.html 

Andy

From: GRAHAM LEASK
 
   Dear List

   I am looking for a script that will calculate the Box M 
 test to test the homogeneity of the variance/covariance 
 matrix between two matrices.

   If anyone could send me the script I would appreciate it.

   I am aware of the scepticism about this test, where due to 
 extreme sensitivity a p value of 0.01 is recommended. Despite 
 this however Box's M test is the established method for 
 identification of stable strategic time periods within the 
 strategic management literature and I would like the 
 opportunity to use this method within either R or S plus.

   Any help would be gratefully received.

   Kind regards


   Graham
 
 
 
 Kind regards
 
 
 Dr Graham Leask
 Economics and Strategy Group
 Aston Business School
 Aston University
 Aston Triangle
 Birmingham
 B4 7ET
 
 Tel: Direct line 0121 204 3150
 email [EMAIL PROTECTED]
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting group size in a data frame [Broadcast]

2006-10-19 Thread Liaw, Andy
Is this sort of what you want?

R aggregate(df[2:3], df[1], function(x) sum(!is.na(x)))
  factor val1 val2
1 2421

Andy 

From: Ulrik Stervbo
 
 Hi all,
 
 I have a data frame with some measured values of some 
 animals. Sometimes the
 measurement failed, resulting in a NA for a measurement and 
 sometimes the
 animal died, resulting in NA for all measurements.
 
 I have several groups of animals. How do I find the size of 
 each group with
 only alive animals? And how do I find the size of the groups for each
 measurement?
 
 An example:
 l1 - list(factor=c(24,24,24), val1=c(2, 3, NA), val2=c(4, NA, NA))
 df - as.data.frame(l1)
 df$factor - factor(df$factor)
 
 The size of factors should be 2 and not 3.  The number of 
 measurement in
 val1 should be 2 and the number of measurements in val2 should be 1
 
 Thanks in advance for any help and suggestions
 Ulrik
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about random sampling in R

2006-10-19 Thread Liaw, Andy
When sampling with replacement (like ordinary bootstrap), each draw is
done independently, and in each draw every point has equal probability
of being drawn.  When sampling without replacement (random permutation),
all possible sequences (permutations) have equal probability of
occurring.  E.g., if the data is 1:2, then (1, 2) has the same
probability of occurring as (2, 1).

Andy

From: tom soyer
 
 Hi,
 
 I looked up the help file on sample(), but didn't find the 
 info I was looking for.
 
 When sample() is used to resample from a distribution, e.g., 
 bootstrap, how does it do it? Does it use an uniform 
 distribution, e.g., runif(), or something else? And, when the 
 help file says:sample(x) generates a random permutation of 
 the elements of x (or 1:x), would I be correct if I 
 translate the statement as follows: it means that the order 
 of sequence, which was generated from a uniform distribution, 
 would look like a random normal distribution.
 
 Thanks,
 
 Tom
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Q] How to fit data to HORIZONTAL line [Broadcast]

2006-10-19 Thread Liaw, Andy
The horizontal line can be fitted by lm(y ~ 1).

Andy 

From: Young-Jin Lee
 
 Dear R users
 
 I posted a question about how to fit data to a straight 
 line this afternoon. But I realized that my question was not 
 correct because I needed to fit data to a HORIZONTAL line, 
 not a ordinary straight line.
 
 I looked at lm method, but could not figure out how to fix 
 the regression coefficient to 0. I also tried nls, but it 
 did not work.
 
 The reason I wanted to fit the data to a horizontal line is 
 that I want to compare AIC/BIC values of two models (a simple 
 straight line mode vs a nonlinear curve model). I thought 
 that I can call
 aic(horizontal_fit_model) and aic (nonlinear_fit_model) to 
 achieve this goal.
 
 If I can compute AIC/BIC value of a horizontal fit model 
 without doing acutal fitting, that would be fine, too.
 
 Thank in advance.
 
 Young-Jin
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MARS help?

2006-10-18 Thread Liaw, Andy
Spencer,
 
MARS fits splines, not disconnected lines.  Perhaps the strucchange package has 
facility to fit your data better.
 
Cheers,
Andy



From: [EMAIL PROTECTED] on behalf of Spencer Graves
Sent: Tue 10/17/2006 11:43 PM
To: R-help; Kurt Hornik
Subject: [R] MARS help? [Broadcast]



  I'm trying to use mars{mda} to model functions that look fairly
close to a sequence of straight line segments.  Unfortunately, 'mars'
seems to totally miss the obvious places for the knots in the apparent
first order spline model, and I wonder if someone can suggest a better
way to do this.  The following example consists of a slight downward
trend followed by a jump up after t1=4, following by a more marked
downward trend after t1=5:

Dat0 - cbind(t1=1:10,
   x=c(1, 0, 0, 90, 99, 95, 90, 87, 80, 77))
library(mda)
fit0 - mars(Dat0[, 1, drop=FALSE], Dat0[, 2],
 penalty=.001)
plot(Dat0, type=l)
lines(Dat0[, 1], fit0$fitted.values,
  lty=2, col=red)

  Are there 'mars' options I'm missing or other software I should be
using?

  I've got thousands of traces crudely like this of different
lengths, and I want an automated way of summarizing similar traces in
terms of a fixed number of knots and associated slopes for each linear
spline segment max(0, t1-t.knot).

  Thanks,
  Spencer Graves

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatic File Reading [Broadcast]

2006-10-18 Thread Liaw, Andy
Works on all platforms:

flist - list.files(path=file.path(somedir, somewhere), 
pattern=[.]csv$)
csvlist - lapply(flist, read.csv, header=TRUE)
whateverList - lapply(csvlist, whatever)

Andy

From: Richard M. Heiberger
 
 Wensui Lui asks:
  is there a similar way to read all txt or csv files with same
  structure from a folder?
 
 
 
 On Windows I use this construct to find all files with the 
 specified wild card name.
 I used the \\ in the file paths with the translate=FALSE, 
 because the / in
 the DOS switches /w/B must not be translated.  On Windows 
 this picks up
 both lower and upper case filenames
 
 A similar construct can be written for Unix.  
 
 tmp - shell('dir c:\\HOME\\rmh\\tmp\\*.R /w/B', intern=TRUE, 
 translate=FALSE)  ##msdos
 for (i in tmp) source(paste(c:\\HOME\\rmh\\tmp\\, i, sep=))
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested source() errors [Broadcast]

2006-10-18 Thread Liaw, Andy
I've seen people doing that without problem.  Not something I'd like to
do myself, precisely because when problems occur, it's difficult to
figure out what went wrong.  Such practice usually indicate that you
ought to organize your functions better.  (You _are_ writing functions,
instead of just scripts?)

Andy 

From: Pierce, Ken
 
 Does anyone know of any issues with nesting source() calls 
 within multiple scripts? I have at least one script which 
 always finds errors when I source it but runs fine when run 
 on its own. It containd source() calls to other scripts and 
 it seems to fail during the first nested
 source() command.
  
 Ken
  
 
 Kenneth B. Pierce Jr.
 
 Research Ecologist
 
 Landscape Ecology, Modeling, Mapping and Analysis Team 
 
 PNW Research Station - USDA-FS 
 
 3200 SW Jefferson Way,  Corvallis,  OR 97331 
 
 [EMAIL PROTECTED]
 
 541 750-7393 
 
 http://www.fsl.orst.edu/lemma/gnnfire
 
  
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CI

2006-10-18 Thread Liaw, Andy
Here's one way:

R x - c(6,11,5,14,30,11,17,3,9,3,8,8)
R confint(lm(x~1), level=.9)
 5 %95 %
(Intercept) 6.546834 14.2865

Andy 

From: Ethan Johnsons
 
 I have a quick question, please.
 
 Does R have function to compute i.e. a 90% confidence 
 interval for the mean for these numbers?
 
  mean (6,11,5,14,30,11,17,3,9,3,8,8)
 [1] 6
 
 I thought pt or qt would give me the interval, but it seems not.
 
 thx much.
 
 ej
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CI

2006-10-18 Thread Liaw, Andy
You did ask for CI of mean, so that's what you got.  If you want CI for
proportion, here are two (non-bootstrap) ways:

R confint(lm(I(x == 1) ~ 1), level=.9)
  5 %  95 %
(Intercept) 0.2666456 0.6133544
R binom.test(sum(x == 1), length(x), conf.level=.9)

Exact binomial test

data:  sum(x == 1) and length(x) 
number of successes = 11, number of trials = 25, p-value = 0.69
alternative hypothesis: true probability of success is not equal to 0.5 
90 percent confidence interval:
 0.2698531 0.6213784 
sample estimates:
probability of success 
  0.44 

I hope these are not HW problems?

Andy 

From: Ethan Johnsons 
 
 Thank you so much for the feedback.
 
 The random numbers are working great.  I have tried 
 non-random numbers, and the outcome is not correct with confint.
 
 Is there a way to compute i.e. a 90% confidence interval for 
 percent of 1?
 
 i.e. where 1 = apple; 2 = orange
 
  x
  [1] 2 2 2 2 2 1 1 2 2 1 2 1 2 2 2 1 1 1 1 1 1 1 2 2 2
  table (x)
 x
  1  2
 11 14
 
  x =11
  confint(lm(x~1), level=0.90)
 5 % 95 %
 (Intercept) NaN  NaN
 
 ej
 
 On 10/18/06, Liaw, Andy [EMAIL PROTECTED] wrote:
  Here's one way:
 
  R x - c(6,11,5,14,30,11,17,3,9,3,8,8) confint(lm(x~1), level=.9)
   5 %95 %
  (Intercept) 6.546834 14.2865
 
  Andy
 
  From: Ethan Johnsons
  
   I have a quick question, please.
  
   Does R have function to compute i.e. a 90% confidence 
 interval for 
   the mean for these numbers?
  
mean (6,11,5,14,30,11,17,3,9,3,8,8)
   [1] 6
  
   I thought pt or qt would give me the interval, but it seems not.
  
   thx much.
  
   ej
  
   __
   R-help@stat.math.ethz.ch mailing list 
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
  
 
 
  
 --
  
  Notice:  This e-mail message, together with any 
 attachments, contains 
  information of Merck  Co., Inc. (One Merck Drive, 
 Whitehouse Station, 
  New Jersey, USA 08889), and/or its affiliates (which may be known 
  outside the United States as Merck Frosst, Merck Sharp  
 Dohme or MSD 
  and in Japan, as Banyu - direct contact information for 
 affiliates is 
  available at http://www.merck.com/contact/contacts.html) 
 that may be 
  confidential, proprietary copyrighted and/or legally 
 privileged. It is 
  intended solely for the use of the individual or entity 
 named on this 
  message. If you are not the intended recipient, and have 
 received this 
  message in error, please notify us immediately by reply e-mail and 
  then delete it from your system.
 
  
 --
  
 
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on Random forest [Broadcast]

2006-10-11 Thread Liaw, Andy
Do provide a reproducible example, as the Posting Guide suggests.

Try:

library(randomForest)
example(predict.randomForest)
iris.pred - predict(iris.rf, iris[ind == 2,], nodes=TRUE)
str(iris.pred)
attr(iris.pred, nodes)

Andy
 

From: Rupendra
 
 Hello all,
 
 I am trying to explore random forest in R. What I want to do 
 is get the node number in which the case falls in the tree of 
 random forest. For that I am calling the predict method as:
 
 learn.pred - predict (learn.rf, 
 newdata=learn.data.x,norm.votes= TRUE,predict.all = TRUE, 
 nodes= TRUE,type=response)
 
 Studying the manual of random forest, I suppose that 
 learn,pred$nodes should contain the node numbers, but there 
 is no attributes called nodes in learn.pred object.
 
  
 
 I am not much experienced with R. Please help me to resolve 
 this issue.
 
  
 
 Thanks in advance,
 
 Rupendra 
  
  
 PRIVACY NOTICE
 
 This email and any attachments may be confidential and/or\...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to convert all columns of a data frame into factors

2006-10-04 Thread Liaw, Andy
Alternatively:

x[] - lapply(x, factor)

Recall that a data frame is a list, so lapply() is a natural choice.

Andy 

From: Gabor Grothendieck
 
 Try this:
 
 replace(BOD, TRUE, lapply(BOD, factor))
 
 
 On 10/4/06, Weiwei Shi [EMAIL PROTECTED] wrote:
  Hi,
 
  I use apply
  apply(x, 2, factor)
 
  but it does not work. please help. thanks.
 
  --
  Weiwei Shi, Ph.D
  Research Scientist
  GeneGO, Inc.
 
  Did you always know?
  No, I did not. But I believed...
  ---Matrix III
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how do I tell configure where to find Java?

2006-10-03 Thread Liaw, Andy
Dear R-help,

I'm trying to build R-2.4.0 on our Opteron-based Scyld cluster.  The system
has gcj (the GNU Java compiler, part of GCC) stuff in /usr/bin.  When I
installed jdk 1.5.08, the install script placed it in /usr/java (I didn't
have a choice, as the script didn't offer that option).  Now when I run
configure in R-2.4.0, it finds gcj, which is not what I want to use.  Is
there a way to tell configure where to look for Java?  I tried configure
--help but didn't see anything related to Java.

Best,
Andy

Andy Liaw, PhD
Biometrics ResearchPO Box 2000 RY33-300
Merck Research LabsRahway, NJ 07065
andy_liaw(a)merck.com  732-594-0820



--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how do I tell configure where to find Java? [Broadcast]

2006-10-03 Thread Liaw, Andy
Before I do that, I would need to remove the gcj stuff that are in /usr/bin.
If I know how to remove gcj, I'd gladly do that.  However, for the
particular version of the OS, the entire GCC seems to be bundled into one
rpm, and I could not remove just the gcj component.  Neither do I wish to
mess with files that are part of some RPMs--- in my experience that's
invitation for trouble later.

Best,
Andy 

From: mike waters 
 
 I'm not familiar with gcj, but my initial reaction would be a 
 ln -s for the relevant compiler executable from /usr/java 
 into /usr/bin.
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
 Sent: 03 October 2006 19:40
 To: r-help
 Subject: [R] how do I tell configure where to find Java?
 
 Dear R-help,
 
 I'm trying to build R-2.4.0 on our Opteron-based Scyld 
 cluster.  The system has gcj (the GNU Java compiler, part of 
 GCC) stuff in /usr/bin.  When I installed jdk 1.5.08, the 
 install script placed it in /usr/java (I didn't have a 
 choice, as the script didn't offer that option).  Now when I 
 run configure in R-2.4.0, it finds gcj, which is not what I 
 want to use.  Is there a way to tell configure where to look 
 for Java?  I tried configure --help but didn't see anything 
 related to Java.
 
 Best,
 Andy
 
 Andy Liaw, PhD
 Biometrics ResearchPO Box 2000 RY33-300
 Merck Research LabsRahway, NJ 07065
 andy_liaw(a)merck.com  732-594-0820
 
 
 
 --
 --
 --
 Notice:  This e-mail message, together with any 
 attachments,...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how do I tell configure where to find Java?

2006-10-03 Thread Liaw, Andy
Thanks to everyone who provided the info.  I tried Martin Morgan's 
suggestion (adding JAVA_HOME=/where/jdk/install/itself) to the 
list of variables defined after `configure', and config.log shows
that the desired Java is found.

The Scyld system is based on RH, but I believe it lags far 
behind FC.  The JDK is from Sun, and didn't come as a RPM.

Best,
Andy

From: Peter Dalgaard
 
 Logan Lewis [EMAIL PROTECTED] writes:
 
  Andy,
  
  On Tuesday 03 October 2006 3:30 pm, Liaw, Andy wrote:
   Before I do that, I would need to remove the gcj stuff 
 that are in 
   /usr/bin. If I know how to remove gcj, I'd gladly do 
 that.  However, 
   for the particular version of the OS, the entire GCC seems to be 
   bundled into one rpm, and I could not remove just the gcj 
 component.  
   Neither do I wish to mess with files that are part of 
 some RPMs--- 
   in my experience that's invitation for trouble later.
  
  The Red Hat way of dealing with different packages 
 providing the same 
  binaries is alternatives.  You will see a bunch of links in 
  /etc/alternatives, and the command /usr/sbin/alternatives 
 allows you 
  to switch between options that provide the same binaries.  
 The trouble 
  is that the Sun JDK package does not interface into this 
 system, and 
  doesn't show up as an option when you execute 
 /usr/sbin/alternatives --config java.
 
 Hmm... I actually have it, but how did it get there?
 
 [EMAIL PROTECTED] R]$ /usr/sbin/alternatives --config java
 
 There are 2 programs which provide 'java'.
 
   SelectionCommand
 ---
1   /usr/lib/jvm/jre-1.4.2-gcj/bin/java
 *+ 2   /usr/lib/jvm/jre-1.5.0-sun/bin/java
 
 Enter to keep the current selection[+], or type selection number:
 failed to create /var/lib/alternatives/java.new: Permission denied
 
 
 
 
 
 -- 
O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark  Ph:  
 (+45) 35327918
 ~~ - ([EMAIL PROTECTED])  FAX: 
 (+45) 35327907
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels of factor when subsetting the factor

2006-09-12 Thread Liaw, Andy
You have at least two choices:

R factor(fact[1:6])
[1] A A A B B B
Levels: A B
R fact[1:6, drop=TRUE]
[1] A A A B B B
Levels: A B

HTH,
Andy


From: Afshartous, David
  
 All,
 
 When I take a subset of a factor the reduced factor still 
 maintains all
 the original levels of the factor when say forming the key in a plot.
 The data is correct, but the variable still remembers the original
 levels.  See below for reproducible code.  Does anyone know how to fix
 this?
 cheers,
 dave
 
 fact = as.factor(c(rep(A, 3),rep(B, 3), rep(C, 3)))
 new.fact = fact[1:6]
  new.fact
 [1] A A A B B B
 Levels: A B C## should only show A B
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 4^2 factorial help

2006-08-18 Thread Liaw, Andy
If you really want the quadratic terms, you need to keep those variables as
numeric, instead of factors.  (You might also want to look into something
like the central composite designs.)

summary() and coef() on the resulting fitted object should give you want you
need.  Things like these are covered in the An Introduction to R manual...

Andy 

From: [EMAIL PROTECTED]
 
 To whom it may concern:
  
 I am trying a factorial design a system of mine that has two factors.
 Each factor was set at four different levels, with one 
 replication for each of the combinations. My data is as follows:
  
 
A   B Response
 
 16002.5   0.0257
 
 26002.5   0.0254
 
 36005  0.0217
 
 46005  0.0204
 
 5600100.0191
 
 6600100.0210
 
 7600200.0133
 
 8600200.0139
 
 98002.5   0.0312
 
 10   800   2.5   0.0317
 
 11   800   5  0.0307
 
 12   800  5  0.0309
 
 13   800   100.0330
 
 14   800   100.0318
 
 15   800   200.0225
 
 16   800   200.0234
 
 17  1000  2.5   0.0350
 
 18  1000  2.5   0.0352
 
 19  1000  5  0.0373
 
 20  1000  5  0.0361
 
 21  1000 100.0432
 
 22  1000 100.0402
 
 23  1000 200.0297
 
 24  1000 200.0306
 
 25  1200  2.5   0.0324
 
 26  1200  2.5   0.0326
 
 27  1200  5  0.0353
 
 28  1200  5  0.0353
 
 29  1200 100.0453
 
 30  1200 100.0436
 
 31  1200 200.0348
 
 32  1200 200.0357
 
  
 
 I am able to enter my data into R and obtain an ANOVA table 
 (which I have been able to verify as correct using an excel 
 spreadsheet), using the following syntax:
 
  
 
 Factorial-data.frame(A=c(rep(c(600, 600, 600, 600, 800,
 800, 800, 800, 1000, 1000, 1000, 1000, 1200, 
 1200, 1200, 1200), each=2)), B=c(rep(c(2.5, 5, 
 10, 20, 2.5, 5, 10, 20, 2.5, 5, 10, 20, 
 2.5, 5, 10, 20), each=2)), Response = c(0.0257, 
 0.0254, 0.0217, 0.0204, 0.0191, 0.021, 0.0133, 0.0139, 
 0.0312, 0.0317, 0.0307, 0.0309, 0.033, 0.0318, 0.0225, 
 0.0234, 0.035, 0.0352, 0.0373, 0.0361, 0.0432, 0.0402, 
 0.0297, 0.0306, 0.0324, 0.0326, 0.0353, 0.0353, 0.0453, 
 0.0436, 0.0348, 0.0357))
 
  
 
  anova(aov(Response~A*B, data=Factorial))
 
  
 
 However, this is as far as I am able to go. I would like to 
 obtain the coefficients of my model, but am unable. I would 
 also like to use other non-linear models as these factors are 
 not linear. Also would like to add A^2 and B^2 into the ANOVA 
 and modeling. 
 
  
 
 Please can you help with regard and offer some advice. Your 
 help is much appreciated.
 
  
 
 Yours sincerely,
 
 Leslie Correia
 
 
 
 Department of Process Engineering
 
 University of Stellenbosch
 
 Private Bag X1
 
 Matieland, 7602
 
 Stellenbosch
 
 Tel:   0837012017
 
 E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   6   7   8   9   10   >