from:"Adaikalavan RAMASAMY"

Re: [R] Merging two files together in R

2007-08-24 Thread Adaikalavan Ramasamy

Not neccessary to do this as you can specify which column in the two 
datases to use as common using the arguments by.x and by.y in merge().



Morassa Mohseni wrote:
 Thanks!
 Ill give this a try. I forgot to mention that the SNP.ID is not named the
 same in both files, even though they contain the same information. I'll just
 go ahead and open one of the files in a text editor and rename the columns
 so they match.
 -Morassa
 PhD Student
 Johns Hopkins Human Genetics
 ---
 Try looking at ?merge
 
 If your data is in two dataframes df1 and df2:
 
 merge(df1, df2)
 
 (This will merge on SNPID because that column is common to both dataframes).
 
 
 
 
 
 ---
 
 -Original Message-
 
 From: [EMAIL PROTECTED]
 
 [*mailto:[EMAIL PROTECTED][EMAIL PROTECTED]]
 On Behalf Of Morassa Mohseni
 
 Sent: 24 August 2007 15:41
 
 To: r-help@stat.math.ethz.ch
 
 Subject: [R] Merging two files together in R
 
 Hi,
 
 Thanks in advance for reading this post.
 
 I received some affymetrix genotyping data back recently (250K, Nsp
 array)...However, in order for me to do any analysis on this data set, I
 need to add append the annotation file to it. Basically I want to do
 something that looks like this:
 
 
 
 
 
 Snpfile(tab delimited):
 
 
 
 
 
 SNPID Genotype X Y
 
 123 AA 13.4 1.2
 
 456 AB 10.1 12.2
 
 789 BB 2.7 14.4
 
 
 
 
 
 Annotation file (csv file):
 
 
 
 
 
 rs#, SNPID, Chromosome
 
 rs23525, 456, 12
 
 rs78423, 123, 4
 
 rs82342, 789, 9
 
 
 
 
 
 What I am trying to get is an output file that looks like this:
 
 
 
 
 
 SNPID rs# Chromosome Genotype X
 
 Y
 
 123 rs78423 4 AA
 
 13.4
 
 1.2
 
 456 rs23525 12 AB
 
 10.1
 
 12.2
 
 789 rs82342 9 BB
 
 2.7
 
 14.4
 
 
 
 
 
 
 
 
 
 The SNPID is the same in both files so I would like to use that to match
 up...but they are not in the same order in both files, so I want to make
 sure that I am appending and merging the 2 files correctly. So far all ive
 really been able to do is import the files into R...Ive been looking through
 the posts, and was wondering if I could use cbind(...) to merge the
 files?...not sure though.
 
 
 
 
 
 Thanks again!!
 
 Morassa Mohseni
 
 
 
 
 
 PhD Student
 
 Johns Hopkins Dept. of Human Genetics
 
 Baltimore, MD
 
 [[alternative HTML version deleted]]
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Does anyone else think this might be worth a warning?!?

2007-08-19 Thread Adaikalavan Ramasamy

First, note that functions in R match named arguments first, followed by 
the position of the arguments in the call.

Second, have a look at how mean and max are defined

mean - function (x, trim = 0, na.rm = FALSE, ...){

max - function (..., na.rm = FALSE){

It's the difference in the position of ... argument or catchall 
argument (sorry, I don't know its formal name) that determines the 
different behaviour. The ... is often converted to a list internally.

So when you type in mean(1,1,2), it is treated as
  mean( x=1, trim=1, na.rm=2 ).

and when you type in max(1,1,2), it is treated as
  max( as.list(1,1,2), na.rm = FALSE )


However, you do raise a good point. Reading mean.default(), I do not see 
how and when the ... argument in mean() comes to play. Perhaps 
redefine mean to be mean - function (..., trim = 0, na.rm = FALSE) so 
that it is similar to max, sum, range etc.

But there might be a philosopphical counter argument for this as well. 
Functions like mean() and sd() are supposed to summarise a single vector 
whereas max, sum, range can work on several vectors by concatenating 
them into a single list. Consider max( c(1,2,3), c(2,3,4) ).

Regards, Adai



Matthew Walker wrote:
 Hi,
 
 I was *very* surprised by this little trick for new players: mean() only 
 considers its first argument!
 
   mean(1,1,2)
 [1] 1
   mean(2,1,1)
 [1] 2
 
 
 I found this very different behaviour to max():
 
   max(1,1,2)
 [1] 2
   max(2,1,1)
 [1] 2
 
 
 
 Perhaps this is the wrong list to ask, but does anyone else think this a 
 little on the interesting side?  Is it not possible to detect a first 
 argument of length one in the presence of other un-named arguments and 
 at least produce a warning?
 
 
 Cheers,
 
 
 Matthew
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] princomp error

2007-07-27 Thread Adaikalavan Ramasamy

You probably got some missing or undefined values.

Either eyeball the data or use sum(is.na(x)), sum(is.nan(x)), 
sum(is.infinite(x)) to find out if you have such data. You may want to 
use which() to find out where they are.

Regards, Adai


Bricklemyer, Ross S wrote:
 I am attempting to run principal components analysis on a dataset of
 spectral reflectance (6 decimal places).  I imported the data using
 read.table and there are both column and row headers.  When I run
 princomp I receive the following error:
 
  
 
 Error in cov.wt(z) : 'x' must contain finite values only
 
  
 
 Where am I going wrong?
 
  
 
 Ross
 
  
 
 ***
 Ross Bricklemyer
 Dept. of Crop and Soil Sciences
 Washington State University
 291D Johnson Hall
 PO Box 646420
 Pullman, WA 99164-6420
 Work: 509.335.3661
 Cell/Home: 406.570.8576
 Fax: 509.335.8674
 Email: [EMAIL PROTECTED]
 
 
 
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.

2007-07-24 Thread Adaikalavan Ramasamy

The name of the table should give you the value. And if you have a 
matrix, you just need to convert it into a vector first.

  m - matrix( LETTERS[ c(1:3, 3:5, 2:4) ], nc=3 )
  m
  [,1] [,2] [,3]
[1,] A  C  B
[2,] B  D  C
[3,] C  E  D
  tb - table( as.vector(m) )
  tb

A B C D E
1 2 3 2 1
  paste( names(tb), :, tb, sep= )
[1] A:1 B:2 C:3 D:2 E:1

If this is not what you want, then please give a simple example.

Regards, Adai



Allan Kamau wrote:
 Hi all,
 If the question below as been answered before I
 apologize for the posting.
 I would like to get the frequencies of occurrence of
 all values in a given variable in a multivariate
 dataset. In short for each variable (or field) a
 summary of values contained with in a value:frequency
 pair, there can be many such pairs for a given
 variable. I would like to do the same for several such
 variables.
 I have used table() but am unable to extract the
 individual value and frequency values.
 Please advise.
 
 Allan.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting elements from a list

2007-07-14 Thread Adaikalavan Ramasamy

Try

  sapply( Lst, function(m) m[1,1] )

Also note that to subset a list, you just need Lst[ 1:10 ] and not
Lst[[ 1:10 ]] (note the double square brackets).

Regards, Adai


Forest Floor wrote:
 Hi,
 
 I would love an easy way to extract elements from a list.  
 
 For example, if I want the first element from each of 10 arrays stored 
 in a list,  
 
 Lst[[1:10]][1,1]  seems like a logical approach, but gives this error:  
 Error: recursive indexing failed at level 3
 
 The following workaround is functional but can get annoying/confusing.  
 
 first.element=vector()
 for (i in 1:10){ first.element=c(first.element, Lst[[i]][1,1])  }
 
 Is there a better way to do this?   Thanks for any help!
 
 Jeff
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to remove the quote in the data frame?

2007-07-14 Thread Adaikalavan Ramasamy

You can achieve this by cbind.data.frame()

Christophe Pallier wrote:
 Beware: you are not working with data.frames but with a vector and a
 matrice.
 (see ?cbind)
 
 Solution: convert 'res' to data.frame.
 
 Christophe
 
 On 7/14/07, Zhang Jian [EMAIL PROTECTED] wrote:
 If I do not add ress into the data frame res, there is no quote in the
 data frame. However, I add ress, all column were found the quote.
 How to remove it?
 If you can delete the quote in the file ress, that is better.
 Thanks.

 ress[1:10]
 [1] ABHO.ABNE ABHO.ACBA ABHO.ACGI ABHO.ACKO ABHO.ACMA ABHO.ACMO
 
 [7] ABHO.ACPS ABHO.ACSE ABHO.ACTE ABHO.ACTR
 res=cbind(obv.value,p.value,mean.sim)
 res[1:10,]
   obv.value p.value mean.sim
 [1,] 2 1.0  6.0
 [2,] 0 1.0  0.0
 [3,]66 0.5 49.6
 [4,] 3 1.0  3.0
 [5,] 0 1.0 64.7
 [6,] 0 1.0  0.0
 [7,] 0 1.0  0.0
 [8,]51 0.5 39.8
 [9,] 0 1.0 47.4
 [10,]59 0.7 72.0

 ress=cbind(res,ress)
 ress[1:10,]
   obv.value p.value mean.sim ress
 [1,] 2   1 6  ABHO.ABNE
 [2,] 0   1 0  ABHO.ACBA
 [3,] 66  0.5   49.6   ABHO.ACGI
 [4,] 3   1 3  ABHO.ACKO
 [5,] 0   1 64.7   ABHO.ACMA
 [6,] 0   1 0  ABHO.ACMO
 [7,] 0   1 0  ABHO.ACPS
 [8,] 51  0.5   39.8   ABHO.ACSE
 [9,] 0   1 47.4   ABHO.ACTE
 [10,] 59  0.7   72 ABHO.ACTR

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] legend and x,y cordinate values

2007-07-13 Thread Adaikalavan Ramasamy

See help(legend) and help(identify).

Ajay Singh wrote:
 Hi,
 
 I have two problems in R.
 
 1. I need 10 cdfs on a graph, the graph needs to have legend. Can you let 
 me know how to get legend on the graph?
 
 2. In ecdf plot, I need to know the x and y co-ordinates. I have to get 
 corresponding y coordinate values to x coordinate value so that I could be 
 able to know the particular percentile value to the x-coordinate value. 
 Can you let me know how could I be able the corresponding values of x to 
 the y coordinates?
 
 Thanking you,
 Looking forward to your kind response,
 Sincerely,
 Ajay.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Algorythmic Question on Array Filtration

2007-07-13 Thread Adaikalavan Ramasamy

Sorry, this sounds like a fairly basic question that can be resolved by 
which() and possible ifelse(). There is no details in your email.

I am afraid you have to learn the basics of R or ask question with more 
details (e.g. example data).

Or ask someone locally.

Regards, Adai



Johannes Graumann wrote:
 Dear All,
 
 I have a data frame with the columns Mass and Intensity (this is mass
 spectrometry stuff). Each of the mass values gives rise to a mass window of
 5 ppm around the individual mass (from mass - mass/1E6*5 to mass +
 mass/1E5*5). I need to filter the array such that in case these mass
 windows overlap I retain the mass/intensity pair with the highest
 intensity.
 I apologize for this question, but I have no formal IT education and would
 value any nudges toward favorable algorithmic solutions highly.
 
 Thanks for any help,
 
 Joh
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix of scatterplots

2007-07-12 Thread Adaikalavan Ramasamy

m - matrix( rnorm(300), nc=3 )
pairs(m, pch=20)

or pairs(m, pch=.)

See help(par) for more details.


livia wrote:
 Hi, I would like to use the function pairs() to plot a matrix of
 scatterplots. For each scatterplot, the data are plotted in circles, can I
 add some argument to change the circles into dots?
 
 Could anyone give me some advice?Many thanks

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Please Help

2007-07-12 Thread Adaikalavan Ramasamy

This is the R-help mailing list. See help(BATCH).

You will need to write the required R commands in a separate script, say 
script.R and then execute it as

  R --no-save  script.R  logfile

You may need to augment the code above to include directory paths etc. 
There are other useful documentations at http://www.r-project.org/

Regards, Adai





Tanya Li wrote:
 Hello,
 
 I got this email address from
 http://tolstoy.newcastle.edu.au/R/e2/help/06/10/2516.html, I got started to
 use R recently, Can I ask you a question ?
 
 this is what I am using:
 platform   i686-pc-linux-gnu
 arch   i686
 os linux-gnu
 system i686, linux-gnu
 status
 major  2
 minor  4.0
 year   2006
 month  10
 day03
 svn rev39566
 language   R
 version.string R version 2.4.0 (2006-10-03)
 
 I wanna to call R in shell( bash ) , write all R commands in the shell
 script and make it a cron job to execute automatically.
 
 do you know how to do this ?
 
 Looking forward to hearing from you, thanks a million.
 
 Tanya Li

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lead

2007-07-12 Thread Adaikalavan Ramasamy

How about

  revLag - function(x, shift=1) rev( Lag(rev(x), shift) )

  x - 1:5
  revLag(x, shift=2)


As a matter of fact, here is a generalized version of Lag to include 
negative shifts.

myLag - function (x, shift = 1){

 xLen - length(x)
 ret - as.vector(character(xLen), mode = storage.mode(x))
 attrib - attributes(x)
 if (!is.null(attrib$label))
 atr$label - paste(attrib$label, lagged, shift, observations)

 if (shift == 0) return(x)

 if( xLen = abs(shift) ) return(ret)

 if (shift  0) x - rev(x)
 retrange = 1:abs(shift)
 ret[-retrange] - x[1:(xLen - abs(shift))]
 if (shift  0) ret - rev(ret)

 attributes(ret) - attrib
 return(ret)
}

and some test examples:

myLag(1:5, shift=2)
  [1] NA NA  1  2  3

myLag(letters[1:4], shift=2)
[1] a b

myLag(factor(letters[1:4]), shift=2)
  [1] NA NA ab
  Levels: a b c d

myLag(1:5, shift=-2)
  [1]  3  4  5 NA NA

myLag(letters[1:4], shift=-2)
  [1] c d   

myLag(factor(letters[1:4]), shift=-2)
  [1] cdNA NA
  Levels: a b c d

Regards, Adai




Aydemir, Zava (FID) wrote:
 Hi,
  
 is there any function in R that shifts elements of a vector to the
 opposite direction of what Lag()  of the Hmisc package does? (something
 like, Lag(x, shift = -1) )
  
 Thanks
  
 Zava
 
 
 This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] speed and looping issues; calculations on big datasets

2007-07-02 Thread Adaikalavan Ramasamy

I don't fully understand what your objective here, but I would try a 
combination of cut and grep in a shell to see if it works. For example, 
if your data was saved as a tab-delimited file and you have some 
predefined patterns you seek, then try the untested code below

  cut -f3-6 | gsub 's/ //g'  tmp
  grep ^00 tmp | wc  rightA
  grep ^001 tmp | wc  rightB
  grep ^010|^0011 tmp | wc  rightC

  cut -f1-3 | | gsub 's/ //g'
  grep 00$ | wc  leftA
  grep 000$|001$ | wc  leftB

Then you got to write a loop and generalise the codes. You can try this 
in bash, perl or rewrite it in C.

If you want more help, the provide more explanation on what the types of 
pattern you are looking for. You might want to try checking the 
BioConductor packages as well.

Regards, Adai



martin sikora wrote:
 dear r users,
 
 i'm a little stuck with the following problem(s), hopefully somebody  
 can offer some help:
 
 i have data organized in a binary matrix, which can become quite big  
 like 60 rows x 10^5 columns (they represent SNP genotypes, for some  
 background info). what i need to do is the following:
 
 let's suppose i have a matrix of size n x m. for each of the m  
 columns, i want to know the counts of unique rows extended one by one  
 from the core column, for both values at the core separately and  
 in both directions. maybe better explained with a little example.
 
 data:
 
 00 0 010
 10 1 001
 11 1 011
 10 0 011
 10 0 010
 
 so the extended unique rows  counts taking e.g. column 3 as core are:
 
 col 3 = 0:
 right:
 patterns / counts
 00 / 3
 001 / 3
 010, 0011 / 2,1
 
 left:
 00 / 3
 000,001 / 1,2
 
 and that for the other subset ( col3 = 1) as well, then doing the  
 whole thing again for the next core column. the reason i need this  
 counts is that i want to calculate frequencies of the different  
 extended sequences to calculate the probability of drawing two  
 identical sequences from the core up to an extended position from the  
 whole set of sequences.
 
 my main problem is speed of the calculations. i tried different ways  
 suggested here in the list of getting the counts of the unique rows,  
 all of them using the table function. both a combination of table 
 ( do.call( paste, c( as.data.frame( mymatrix) ) ) ) or table( apply 
 ( mymatrix , 2 , paste , collapse = ) ) work fine, but are too slow  
 for bigger matrices that i want to calculate (at least in my not very  
 sophisticated function). then i found a great suggestion here to do a  
 matrix multiplication with a vector of 2^(0:ncol-1) to convert each  
 row into a decimal number, and do table on those. this speeds up  
 things quite nicely, although the problem is that it of course does  
 not work as soon as i extended for more than 60 columns, because the  
 decimal numbers get to large to accurately distinguish between a 0  
 and 1 at the smallest digit:
 
   2^60+2 == 2^60
 [1] TRUE
 
 another thing is that so far i could not come up with an idea on how  
 or if it is possible to do this without the loops i am using, one  
 large loop for each column in turn as core, and then another loop  
 within that extends the rows by growing column numbers. since i am  
 not the best of programmers (and still quite new to R), i was hoping  
 that somebody has some advice on doing this calculations in a more  
 elegant and more importantly, fast way.
 just to get the idea, the approach with the matrix multiplication  
 takes 20s for a 60 x 220 matrix on my macbook pro, which is obviously  
 not perfect, considering i would like to use this function for  
 matrices of size 10^2 x 10^5 or even more.
 
 so i would be very thankful for any ideas, suggestions etc to improve  
 this
 
 cheers
 martin
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sampling question

2007-06-28 Thread Adaikalavan Ramasamy

Lets assume your zcta data looks like this

set.seed(12345) ## temporary for reproducibility
zcta - data.frame( zipcode=LETTERS[1:5], prop=runif(5) )
zcta
zipcode  prop
1   A 0.7209039
2   B 0.8757732
3   C 0.7609823
4   D 0.8861246
5   E 0.4564810

This says that 72.1% of the population in zipcode A is female, ..., and 
45.6% in zipcode E is female.


Now suppose you sampled 20 people and you recorded the zipcode (and 
other variables) and stored in 'samp'

samp - data.frame( id=1:20,
zipcode=LETTERS[ sample(1:5, 20, replace=TRUE) ])


Now, I am not sure what you want to do. But I could see two possible 
meanings from your message.

1) If you want to sample 10 observation, with each observation weighted 
INDEPENDENTLY by the proportion of women in its zipcode, try something 
like the following. The problem with this option is that it depends on 
the prevalence of the zipcodes of the observations.

comb - merge( samp, zcta, all.x=T )
comb - comb[ order(comb$id), ]
comb[ sample( comb$id, 10, prob=comb$prop ), ]



2) If you want to sample x% in each zipcode, where x is the proportion 
of women in that zipcode. Then this is what I would call stratified 
sampling. Try this:

tmp - split( samp, samp$zipcode )
out - NULL

for( z in names(tmp) ){
   df - tmp[[z]]
   p  - zcta[ zcta$zipcode == z, prop ]
   out[[z]] - df[ sample( 1:nrow(df), p*nrow(df) ), ]
}
do.call(rbind, out)

You probably need a variant of these but if you need further help, you 
will need to provide more information and better yet examples.

Regards, Adai



Kirsten Beyer wrote:
 I am interested in locating a script to implement a sampling scheme
 that would basically make it more likely that a particular observation
 is chosen based on a weight associated with the observation.  I am
 trying to select a sample of ~30 census blocks from each ZIP code area
 based on the proportion of women in a ZCTA living in a particular
 block.  I want to make it more likely that a block will be chosen if
 the proportion of women in a patient's age group in a particular block
 is high. Any ideas are appreciated!
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data.frame

2007-06-18 Thread Adaikalavan Ramasamy

See help(dim) and please read the manuals before asking basic questions 
like this. Thank you.


elyakhlifi mustapha wrote:
 hello,
 are there functions giving the columns number and the rows number of a matrix?
 thanks.
 
 
   
 _ 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to make a table of a desired dimension

2007-06-08 Thread Adaikalavan Ramasamy

You need to basically use table on factors with fixed pre-specified 
levels. For example:

  x - c(runif(100,10,40), runif(100,43,55))
  y - c(runif(100,7,35),  runif(100,37,50))
  z - c(runif(100,10,42), runif(100,45,52))
  xx - ceiling(x);  yy - ceiling(y);  zz - ceiling(z)


  mylevels - min( c(xx, yy, zz) ) : max( c(xx, yy, zz) )

  out - cbind( table( factor(xx, levels=mylevels) ),
table( factor(yy, levels=mylevels) ),
table( factor(zz, levels=mylevels) ) )

You could replace the last command with simply

  sapply( list(xx, yy, zz),
function(vec) table( factor(vec, levels=mylevels) ) )

Regards, Adai



Rubén Roa-Ureta wrote:
 Hi ComRades,
 
 I want to make a matrix of frequencies from vectors of a continuous 
 variable spanning different values. For example this code
 x-c(runif(100,10,40),runif(100,43,55))
 y-c(runif(100,7,35),runif(100,37,50))
 z-c(runif(100,10,42),runif(100,45,52))
 a-table(ceiling(x))
 b-table(ceiling(y))
 c-table(ceiling(z))
 a
 b
 c
 
 will give me three tables that start and end at different integer 
 values, and besides, they have 'holes' in between different integer 
 values. Is it possible to use 'table' to make these three tables have 
 the same dimensions, filling in the absent labels with zeroes? In the 
 example above, the desired tables should all start at 8 and tables 'a' 
 and 'c' should put a zero at labels '8' to '10', should all put zeros in 
 the frequencies of the labels corresponding to the holes, and should all 
 end at label '55'. The final purpose is the make a matrix and use 
 'matplot' to plot all the frequencies in one plot, such as
 
 #code valid only when 'a', 'b', and 'c' have the proper dimension
 p-mat.or.vec(48,4)
 p[,1]-8:55
 p[,2]-c(matrix(a)[1:48])
 p[,3]-c(matrix(b)[1:48])
 p[,4]-c(matrix(c)[1:48])
 matplot(p)
 
 I read the help about 'table' but I couldn't figure out if dnn, 
 deparse.level, or the other arguments could serve my purpose. Thanks for 
 your help
 Rubén
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Choosing a column for analysis in a function

2007-05-31 Thread Adaikalavan Ramasamy

Perhaps the use of as.character() like following might help?

  data.whole$Analyte.Values - data.whole$as.character(analyte)


Junnila, Jouni wrote:
 Hello all,
 
 I'm having a problem concerning choosing columns from a dataset in a
 function.
 
 I'm writing a function for data input etc., which first reads the data,
 and then does several data manipulation tasks. 
 The function can be then used, with just giving the path of the .txt
 file where the data is being held. 
 
 These datasets consists of over 20 different analytes. Though,
 statistical analyses should be made seperately analyte by analyte. So
 the function needs to be able to choose a certain analyte based on what
 the user of the function gives as a parameter when calling the function.
 The name of the analyte user gives, is the same as a name of a column in
 the data set.
 
 The question is: how can I refer to the parameter which the user gives,
 inside the function? I cannot give the name of the analyte directly
 inside the function, as the same function should work for all the 20
 analytes.
 I'm giving some code for clarification:
 
 datainput - function(data1,data2,data3,data4,data5,data6,analyte)
 {
 ...
 ##data1-data6 being the paths of the six datasets I want to combine and
 analyte being the special analyte I want to analyze and which can be
 found on each of the datasets as a columnname.##
 ##Then:##
 ...
 data.whole - subset(data.whole,
 select=c(Sample.Name,Analyte.Values,Day,Plate))
 
 ##Is for choosing the columns needed for analysis. The Analyte should
 now be the column of the analyte, the users is referring to when calling
 the datainput-function. How to do it? ## 
 I've tried something like
 data.whole$Analyte.Values - data.whole$analyte ##(Or in quotes
 analyte)
 But this does not work. I've tried several other tricks also, but
 cannot get it to work. Can someone help?
 
 Thanks in advance,
 
 Jouni
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Venn diagram

2007-05-31 Thread Adaikalavan Ramasamy

I cannot find the venn package (searched the author's page and googled) 
despite some posts referring to it, so I cannot help you. But I can 
suggest you check out the varpart in vegan package, vennDiagram in limma 
package or http://finzi.psych.upenn.edu/R/Rhelp02a/archive/14637.html

Regards, Adai



Nina Hubner wrote:
 Hello,
 
  
 
 I am a total beginner with “R” and found a package “venn” to 
 create a venn diagram. 
 
 The problem is, I cannot create the vectors required for the diagram.
 
 The manual say:
 R venn(accession, libname, main = All samples)
 where accession was a vector containing the codes identifying 
 the RNA sequences, and libname was a vector containing the codes 
 identifying the tissue sample (library).
 
 
 The structure of my data is as follows:
 
  
 
 R   structure(list(cyto = c(A, “B”, “C”, “D”), nuc = c(“A”, “B”, “E”, “”),
 chrom = c(“B”, “F”, “”, “”)),.Names = c(cyto, Nuc, chrom))
 
 
 accession should be A, B, and libname schould be cyto, 
 nuc and chrom as I understand it...
 
 
 Could you help me?
 
  
 
 Sorry, that might be a very simple question, but I am a total beginner 
 as said before! The question has already been asked, but unfortunately 
 there was no answer...
 
  
 
 Thank you a lot,
 
 Nina Hubner
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chosing a subset of a non-sorted vector

2007-05-22 Thread Adaikalavan Ramasamy

You want to select two subplots for each DL value. Try:

  df - data.frame( DL=gl(3,4), subplot=rep(1:4,3) )

  df$index - 1:nrow(df)
  ind - tapply( df$index, df$DL, function(x) sample(x,2) )
  df[ unlist(ind), ]

You could also have used rownames(df) instead of creating df$index.

OR

   tmp - lapply( split(df, df$DL), function(m) m[sample(1:nrow(m),2),] )
   do.call(rbind, tmp)

Regards, Adai



Christoph Scherber wrote:
 Dear all,
 
 I have a tricky problem here:
 
 I have a dataframe with biodiversity data in which suplots are a 
 repeated sequence from 1 to 4 (1234,1234,...)
 
 Now, I want to randomly pick two subplots each from each diversity level 
 (DL).
 
 The problem is that it works up to that point - but if I try to subset 
 the whole dataframe, I get stuck:
 
 DL=gl(3,4)
 subplot=rep(1:4,3)
 diversity.data=data.frame(DL,subplot)
 
 
 subplot.sampled=NULL
 for(i in 1:3)
 subplot.sampled=c(subplot.sampled,sort(sample(4,2,replace=F)))
 
 subplot.sampled
 [1] 3 4 1 3 1 3
 subplot[subplot.sampled]
 [1] 3 4 1 3 1 3
 
 ## here comes the tricky bit:
 
 diversity.data[subplot.sampled,]
  DL subplot
 31   3
 41   4
 11   1
 3.1  1   3
 1.1  1   1
 3.2  1   3
 
 How can I select those rows of diversity.data that match the exact 
 subplots in subplot.sampled?
 
 
 Thank you very much for your help!
 
 Best wishes,
 Christoph
 
 (I am using R 2.4.1 on Windows XP)
 
 
 ##
 Christoph Scherber
 DNPW, Agroecology
 University of Goettingen
 Waldweg 26
 D-37073 Goettingen
 
 +49-(0)551-39-8807
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Installing packages from command line on Linux RHEL4

2007-05-22 Thread Adaikalavan Ramasamy

Assuming the R packages have been downloaded locally and end with 
tar.gz, then how about simply changing to where the files are located 
and typing the following command?

  ls *.tar.gz | while read x; do echo R CMD INSTALL $x; done | bash


Alternatively, you can use the install.packages() function in R.

Regards, Adai




Kermit Short wrote:
 Dirk-
   Many thanks for your reply.  As I mentioned, I know very little
 about programming in 'R' and what I've got is a BASH script.  If needs be,
 I'll look up how to read in a text file through R and add that into your
 script in lieu of the (argv) stuff, but you wouldn't happen to know how to
 accomplish the same thing using the 
 
 R CMD INSTALL
 
 Shell command?
 
 Thanks!
 
 -Kermit
 
 -Original Message-
 From: Dirk Eddelbuettel [mailto:[EMAIL PROTECTED] 
 Sent: Monday, May 21, 2007 12:00 PM
 To: [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Installing packages from command line on Linux RHEL4
 
 
 Hi Kernit,
 
 On 21 May 2007 at 11:37, Kermit Short wrote:
 | Greetings.
 | 
 |I am a System Administrator, and thus have very little knowledge of R
 | itself.  I have been asked to install a list of some 200 packages (from
 | CRAM) to R.  Rather than installing each package manually, I was hoping I
 | could script this.  I've written a BASH script that hopefully will do
 this,
 | but I'm wondering about the Mirror Selection portion of the installation
 | process.  I've looked and can't find anywhere a parameter to supply that
 | specifies a mirror to use so that I don't have to manually select it for
 | each package I want to install.  In this case, with nearly 200 packages to
 | install, this could become quite tedious.  Does anyone have any
 | suggestions?
 
 The narrow answer is try adding 
 
   repos=http://cran.us.r-project.org;
 
 Also, and if I may, the littler front-end (essentially #! shebang support
 for R)
 helps there:
 
 basebud:~ cat bin/installPackages.r
 #!/usr/bin/env r
 #
 # a simple example to install all the listed arguments as packages
 
 if (is.null(argv)) {
   cat(Usage: installPackages.r pkg1 [pkg2 [pkg3 [...]]]\n)
   q()
 }
 
 for (pkg in argv) {
   install.packages(pkg, lib=/usr/local/lib/R/site-library, depend=TRUE)
 }
 
 You would still need to add repos=... there. I tend to do that in my
 ~/.Rprofile.
 
 Hth, Dirk


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] translate SAS code

2007-05-22 Thread Adaikalavan Ramasamy

I am not sure if R can read formulas and if it does, it probably as 
characters. I would suggest you Copy and Paste Special (as values) onto 
a new sheet and save it a tab delimited files.


elyakhlifi mustapha wrote:
 good morning,
 I have some SAS code to translate in R code and when I export data from Excel 
 to R I have to read formula writed as follow
 
 C604=(C181/S181)*(100-C182)*(100/85)
 
 or
 
 if C325=. then C740=(C346/C103)*100| else C740=(C346/C325)*100
 
 I find some difficulties to write a good program to read and calculate these 
 formulas
 there are several kinds of formulas there are with conditional and without 
 conditional
 can you help me please?
 thanks.
 
 
   
 _ 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with this indexing

2007-05-21 Thread Adaikalavan Ramasamy

merge()


javier garcia-pintado wrote:
 Hi all,
 Let's say I have a long data frame and a short one, both with three
 colums: $east, $north, $value
 And I need to fill in the short$value, extracting the corresponding
 value from long$value, for coinciding $east and $north in both tables.
 I know the possibility:
 
 for (i in 1:length(short$value)){
  short$value[i] - long$value[long$east==short$east 
 long$north==short$north]
 }
 
 How could I avoid this loop?
 
 Thanks and regards,
 
 Javier
 --
 
 Javier García-Pintado
 Institute of Earth Sciences Jaume Almera (CSIC)
 Lluis Sole Sabaris s/n, 08028 Barcelona
 Phone: +34 934095410
 Fax:   +34 934110012
 e-mail:[EMAIL PROTECTED] 
 
 
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] density

2007-05-19 Thread Adaikalavan Ramasamy

Try bkde2D {KernSmooth} or kde2d {MASS}.


Bruce Willy wrote:
 Hello,
  
 I have a n*2 matrix, called plan, which contains n observations from 2 
 variates.
  
 I want a kernel density estimate of the joint distribution of these 2 
 variates.
 I try : density(plan). Unfortunately, R thinks there is 2n observations (if 
 n=10, 20 observations), where there is only n.
  
 How to to make a multivariate kernel density estimate ?
  
 Thank you very much.
 _
 
 météo et bien plus encore !
 
   [[alternative HTML version deleted]]
 
 
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] optional fields in function declarations

2007-05-19 Thread Adaikalavan Ramasamy

Can you provide an simple example of what you want the function to do?

Generally, I set some value in the default.

raise - function(x, power=1){ return( x^power ) }

  raise(5)
[1] 5
  raise(5,3)
[1] 125


Or you can do the same but in a slightly unclear manner.

raise - function(x, power){
   if(missing(power)) power - 1
   return( x^power )
}

I prefer the former.

Regards, Adai



[EMAIL PROTECTED] wrote:
 Dear R users,
 
 I need to create a set of function to solve some tasks. I want to leave the 
 operator to decide whether uses default parameters or change it; so the 
 functions may have some optional fields. I tied to use the function 
 missing(), but it will work properly only if the optional field is decleared 
 at last in the function.
 Can you give me some suggestion an some reference?
 
 thank you.
 
 
 Claudio
 
 
 --
 Passa a Infostrada. ADSL e Telefono senza limiti e senza canone Telecom
 http://click.libero.it/infostrada
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating a multivariate set of variables with given intercorrelations

2007-05-19 Thread Adaikalavan Ramasamy

I presume you want to generate normally or t-distributed values ? If so 
either have a look mvrnorm in the MASS package or the mvtnorm package.



Dimitri Liakhovitski wrote:
 Hi!
 I was wondering if there is a package in R that allows one to create a
 multivariate data set with pre-specified intercorrelations among
 variables, e.g., a set of 4 variables (with a length of N each), such
 that the correlations between variables are:
 
  a b c d
 a   1 r1r2r3
 b  1 r4r5
 c 1 r6
 d 1
 
 Thank you very much!
 Dimitri Liakhovitski
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple programming question

2007-05-18 Thread Adaikalavan Ramasamy

According to your post you are assuming that there are only 3 unique 
values for var3 within each category. But category C and D have 4 unique 
values for var3.

split(dfr, dfr$categ)
...
$C
   id categ var3 score
3   3 C6  high
7   7 C5   mid
11 11 C3   low
15 15 C1   low
...

If you meant something different, then just change myfun() below


  gmax - function(x, rnk=1){
   ## generalized maximum with rnk=1 being the bigest value (i.e. max)
   return( sort( unique(x), decreasing=T )[rnk] )
  }

  myfun - function(x){ ifelse( x==gmax(x,1), high,
ifelse( x==gmax(x,2), med, low ) ) }

  out   - lapply( split(dfr$var3, dfr$categ), myfun )

  data.frame( dfr, my.score = unsplit(out, dfr$categ) )

Regards, Adai



Lauri Nikkinen wrote:
 Hi R-users,
 
 I have a simple question for R heavy users. If I have a data frame like this
 
 
 dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
 var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
 dfr - dfr[order(dfr$categ),]
 
 and I want to score values or points in variable named var3 following this
 kind of logic:
 
 1. the highest value of var3 within category (variable named categ) -
 high
 2. the second highest value - mid
 3. lowest value - low
 
 This would be the output of this reasoning:
 
 dfr$score -
 factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low))
 dfr
 
 The question is how I do this programmatically in R (i.e. if I have 2000
 rows in my dfr)?
 
 I appreciate your help!
 
 Cheers,
 Lauri
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MICE for Cox model

2007-05-17 Thread Adaikalavan Ramasamy

I encountered this problem about 18 months ago. I contacted Prof. Fox 
and Dr. Malewski (the R package maintainers for mice) but they referred 
me to Prof. van Buuren. I wrote to Prof. van Buuren but am unable to 
find his reply (if he did reply).

Here are the functions I used at that time, if you want to take it with 
lots of salt. Let me know if you find anything fishy with it.


coxph.mids - function (formula, data, ...) {

   call - match.call()
   if (!is.mids(data)) stop(The data must have class mids)

   analyses - as.list(1:data$m)

   for (i in 1:data$m) {
 data.i- complete(data, i)
 analyses[[i]] - coxph(formula, data = data.i, ...)
   }

   object - list(call = call, call1 = data$call,
  nmis = data$nmis, analyses = analyses)

   oldClass(object) - if (.SV4.) mira else c(mira, coxph)
   return(object)
}


And in the function 'pool', the small sample adjustment requires 
residual degrees of freedom (i.e. dfc). For a cox model, I believe that 
this is simply the number of events minus the regression coefficients. 
There is support for this from middle of page 149 of the book by Parmer 
 Machin (ISBN 0471936405). Please correct me if I am wrong.

Here is the slightly modified version of pool :


pool - function (object, method = smallsample) {

   call - match.call()
   if (!is.mira(object)) stop(The object must have class 'mira')

   if ((m - length(object$analyses))  2)
 stop(At least two imputations are needed for pooling.\n)

   analyses - object$analyses

   k - length(coef(analyses[[1]]))
   names - names(coef(analyses[[1]]))
   qhat  - matrix(NA, nrow = m, ncol = k, dimnames = list(1:m, names))
   u - array(NA, dim = c(m, k, k),
  dimnames = list(1:m, names, names))

   for (i in 1:m) {
 fit   - analyses[[i]]
 qhat[i, ] - coef(fit)
 u[i, , ]  - vcov(fit)
   }

   qbar - apply(qhat, 2, mean)
   ubar - apply(u, c(2, 3), mean)
   e - qhat - matrix(qbar, nrow = m, ncol = k, byrow = TRUE)
   b - (t(e) %*% e)/(m - 1)
   t - ubar + (1 + 1/m) * b
   r - (1 + 1/m) * diag(b/ubar)
   f - (1 + 1/m) * diag(b/t)
   df - (m - 1) * (1 + 1/r)2

   if (method == smallsample) {

 if( any( class(fit) == coxph ) ){

   ### this loop is the hack for survival analysis ###

   status   - fit$y[ , 2]
   n.events - sum(status == max(status))
   p- length( coefficients( fit )  )
   dfc  - n.events - p

 } else {

   dfc - fit$df.residual
 }

 df - dfc/((1 - (f/(m + 1)))/(1 - f) + dfc/df)
   }

   names(r) - names(df) - names(f) - names
   fit - list(call = call, call1 = object$call, call2 = object$call1,
   nmis = object$nmis, m = m, qhat = qhat, u = u,
   qbar = qbar, ubar = ubar, b = b, t = t, r = r, df = df,
   f = f)
   oldClass(fit) - if (.SV4.) mipo else c(mipo, oldClass(object))
   return(fit)
}


print.miro only gives the coefficients. Often I need the standard errors
as well since I want to test if each regression coefficient from
multiple imputation is zero or not. Since the function summary.mipo does
not exist, can I suggest the following


summary.mipo - function(object){

if (!is.null(object$call1)){
  cat(Call: )
  dput(object$call1)
}

est  - object$qbar
se   - sqrt(diag(object$t))
tval - est/se
df   - object$df
pval - 2 * pt(abs(tval), df, lower.tail = FALSE)

coefmat - cbind(est, se, tval, pval)
colnames(coefmat) - c(Estimate, Std. Error,
 t value, Pr(|t|))

cat(\nCoefficients:\n)
printCoefmat( coefmat, P.values=T, has.Pvalue=T, signif.legend=T )

cat(\nFraction of information about the coefficients
missing due to nonresponse:, \n)
print(object$f)

ans - list( coefficients=coefmat, df=df,
 call=object$call1, fracinfo.miss=object$f )
invisible( ans )

}


Hope this helps.

Regards, Adai



Inman, Brant A. M.D. wrote:
 R-helpers:
 
 I have a dataset that has 168 subjects and 12 variables.  Some of the
 variables have missing data and I want to use the multiple imputation
 capabilities of the mice package to address the missing data. Given
 that mice only supports linear models and generalized linear models (via
 the lm.mids and glm.mids functions) and that I need to fit Cox models, I
 followed the previous suggestion of John Fox and have created my own
 function cox.mids to use coxph to fit models to the imputed datasets.
 
 (http://tolstoy.newcastle.edu.au/R/help/06/03/22295.html)
 
 The function I created is:
 
 
 
 cox.mids - function (formula, data, ...)
 {
 call - match.call()
 if (!is.mids(data)) 
 stop(The data must have class mids)
 analyses - as.list(1:data$m)
 for (i in 1:data$m) {
 data.i - complete(data, i)
 analyses[[i]] - coxph(formula, data = data.i, ...)
 }
 object - list(call = call, call1 = data$call, nmis

Re: [R] controling the size of vectors in a matrix

2007-05-17 Thread Adaikalavan Ramasamy

1) Your colnames need 4 elements and not 3
2) Utilize the argument 'n' in your random number generators

Your codes could be simplified as:

  m - cbind( treatmentgrp  = sample( 1:2, n, replace=T ),
  strata= sample( 1:2, n, replace=T ),
  survivalTime  = rexp( n, rate=0.07 ),
  somethingElse = rexp( n, rate=0.02 ) )

Regards, Adai



raymond chiruka wrote:
 hie R users

   l have the following matrix
   n=20
   m-matrix(nrow=n,ncol=4)
   colnames(m)=c(treatmentgrp,strata,survivalTime)  
   for(i in 1:n) 
 m[i,]-c(sample(c(1,2),1,replace=TRUE),sample(c(1:2),1,replace=TRUE),rexp(1,0.07),rexp(1,0.02))
   

   print(m)
 1.l would like to control the size of the treatment variable eg treatment 
 1=size 5 treatment 2=size 15.
   
   2. l would also want to control the size of the strata ie in treatment 1 
 divide the strata in to 2 etc.
   
   3. For the survival time l would like to have treatment 1-strata 1  using a 
 different rate  from treatment 2 -strata 2 etc to generate  the survival time.
   
   the program l used above does nt do this so if you can help
   
   thanks 
   

 -
 Building a website is a piece of cake. 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating columns

2007-05-17 Thread Adaikalavan Ramasamy

See my response to your thread  controling the size of vectors in a 
matrix. Please do not create multiple threads on the same day asking 
basically the same question, especially if you cannot substantially 
improve the clarity and quality of the post.

Multiple threads asking the same question badly within the span of few 
hours leads to people missing out on other people's response and thereby 
essentially wasting their time.



raymond chiruka wrote:
 l would like to create the following matrice
   
   treatmentgrpstrata
11  11  11  12 
  12  12  21  21  21   
22  22  22   l should be able to 
 choose the size of the treatment grps and stratas the method l used intially 
 creates the matrice randomly
   
 n=20
   
 m - cbind( treatmentgrp  = sample( 1:2,n, replace=T ),
   
   strata= sample( 1:2, n, replace=T ),
   
   survivalTime  = rexp( n, rate=0.07 ),
   
   somethingElse = rexp( n, rate=0.02 ) 

 thanks
   
   

 -
 Give spam the boot. Take control with tough spam protection
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MICE for Cox model

2007-05-17 Thread Adaikalavan Ramasamy

Are you sure you used my pool function? Because as I just have 
discovered, it had a minor typo in the code. After replacing 
df - (m - 1) * (1 + 1/r)2 with df - (m - 1) * (1 + 1/r)^2 in my 
pool() function, I get


  library(survival); data(pbc)
  d - pbc[,c('time', 'status', 'age', 'sex',
  'hepmeg', 'platelet', 'trt', 'trig')]
  d[d==-9] - NA
  d[,c(4,5,7)] - lapply(d[,c(4,5,7)], FUN=as.factor)

  library(mice)
  imp - mice(d, m=10, maxit=10, diagnostics=T, seed=500,
   defaultImputationMethod=c('norm', 'logreg', 'polyreg'))
  fit - coxph.mids( Surv(time,status) ~ age + sex + hepmeg + platelet
  + trt + trig, imp)

  pool(fit)

Call: pool(object = fit)

Pooled coefficients:
  age sex1  hepmeg1 platelet trt2 
   trig
  0.034924182 -0.208897827  0.987641362 -0.001559426  0.070124108 
0.004122421

Fraction of information about the coefficients missing due to nonresponse:
age   sex1hepmeg1   platelet   trt2   trig
0.06624167 0.19490517 0.27300965 0.21950332 0.27768153 0.40658964

Regards, Adai



Inman, Brant A. M.D. wrote:
 Adai,
 
 Thanks for the functions.  I tried using your functions and I get the
 same error message during the pooling part:
 
 pool(micefit)
 Error in names(df) - names(f) - names : 'names' attribute [5] must be
 the same length as the vector [0]
 
 Brant
 -Original Message-
 From: Adaikalavan Ramasamy [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, May 17, 2007 4:56 AM
 To: Inman, Brant A. M.D.
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] MICE for Cox model
 
 I encountered this problem about 18 months ago. I contacted Prof. Fox 
 and Dr. Malewski (the R package maintainers for mice) but they referred 
 me to Prof. van Buuren. I wrote to Prof. van Buuren but am unable to 
 find his reply (if he did reply).
 
 Here are the functions I used at that time, if you want to take it with 
 lots of salt. Let me know if you find anything fishy with it.
 
 
 coxph.mids - function (formula, data, ...) {
 
call - match.call()
if (!is.mids(data)) stop(The data must have class mids)
 
analyses - as.list(1:data$m)
 
for (i in 1:data$m) {
  data.i- complete(data, i)
  analyses[[i]] - coxph(formula, data = data.i, ...)
}
 
object - list(call = call, call1 = data$call,
   nmis = data$nmis, analyses = analyses)
 
oldClass(object) - if (.SV4.) mira else c(mira, coxph)
return(object)
 }
 
 
 And in the function 'pool', the small sample adjustment requires 
 residual degrees of freedom (i.e. dfc). For a cox model, I believe that 
 this is simply the number of events minus the regression coefficients. 
 There is support for this from middle of page 149 of the book by Parmer 
  Machin (ISBN 0471936405). Please correct me if I am wrong.
 
 Here is the slightly modified version of pool :
 
 
 pool - function (object, method = smallsample) {
 
call - match.call()
if (!is.mira(object)) stop(The object must have class 'mira')
 
if ((m - length(object$analyses))  2)
  stop(At least two imputations are needed for pooling.\n)
 
analyses - object$analyses
 
k - length(coef(analyses[[1]]))
names - names(coef(analyses[[1]]))
qhat  - matrix(NA, nrow = m, ncol = k, dimnames = list(1:m, names))
u - array(NA, dim = c(m, k, k),
   dimnames = list(1:m, names, names))
 
for (i in 1:m) {
  fit   - analyses[[i]]
  qhat[i, ] - coef(fit)
  u[i, , ]  - vcov(fit)
}
 
qbar - apply(qhat, 2, mean)
ubar - apply(u, c(2, 3), mean)
e - qhat - matrix(qbar, nrow = m, ncol = k, byrow = TRUE)
b - (t(e) %*% e)/(m - 1)
t - ubar + (1 + 1/m) * b
r - (1 + 1/m) * diag(b/ubar)
f - (1 + 1/m) * diag(b/t)
df - (m - 1) * (1 + 1/r)2
 
if (method == smallsample) {
 
  if( any( class(fit) == coxph ) ){
 
### this loop is the hack for survival analysis ###
 
status   - fit$y[ , 2]
n.events - sum(status == max(status))
p- length( coefficients( fit )  )
dfc  - n.events - p
 
  } else {
 
dfc - fit$df.residual
  }
 
  df - dfc/((1 - (f/(m + 1)))/(1 - f) + dfc/df)
}
 
names(r) - names(df) - names(f) - names
fit - list(call = call, call1 = object$call, call2 = object$call1,
nmis = object$nmis, m = m, qhat = qhat, u = u,
qbar = qbar, ubar = ubar, b = b, t = t, r = r, df = df,
f = f)
oldClass(fit) - if (.SV4.) mipo else c(mipo, oldClass(object))
return(fit)
 }
 
 
 print.miro only gives the coefficients. Often I need the standard errors
 as well since I want to test if each regression coefficient from
 multiple imputation is zero or not. Since the function summary.mipo does
 not exist, can I suggest the following
 
 
 summary.mipo - function(object){
 
 if (!is.null(object$call1)){
   cat(Call: )
   dput

Re: [R] Split a vector(list) into 3 list

2007-05-17 Thread Adaikalavan Ramasamy

Don't need to upgrade R just to get index() working. You can try the 
following modification.

  v - sample(1:3, 30, replace = TRUE)
  split( 1:length(v), v )

Should do the trick. Check out the reverse function unsplit().

Regards, Adai



Leeds, Mark (IED) wrote:
 index is definitely defined in my version ( 2.4.0) because when I do
 ?index, I get info. Maybe you
 Are using an older or younger version of R ? I'm really not sure why you
 are experiencing that problem.
 
 
 -Original Message-
 From: Patrick Wang [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, May 17, 2007 8:44 PM
 To: Leeds, Mark (IED)
 Cc: Patrick Wang; r-help@stat.math.ethz.ch
 Subject: RE: [R] Split a vector(list) into 3 list
 
 Thanks,
 
 no index function was defined in R.
 
 I try to use the split(order(temp), temp), the number of groups are
 correct, however the result doesnot seem to be correct. I try to match
 before the ordered index and the original index.
 
 Pat
 
 If  temp is your vector then split(index(temp),temp) will give you 
 what you want.


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Wang
 Sent: Thursday, May 17, 2007 8:15 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Split a vector(list) into 3 list

 Hi,

 I have a vector contains values 1,2,3.

 Can I call a function split to split it into 3 vectors with 1 
 corresponds to value ==1, which contain all the indexes for value==1.

 2 corresponds to value ==2 which contain all the indexes for value=2

 Thanks
 pat

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

 This is not an offer (or solicitation of an offer) to buy/sell the 
 securities/instruments mentioned or an official confirmation.  Morgan 
 Stanley may deal as principal in or own or act as market maker for 
 securities/instruments mentioned or may advise the issuers.  This is 
 not research and is not from MS Research but it may refer to a 
 research analyst/research report.  Unless indicated, these views are 
 the author's and may differ from those of Morgan Stanley research or 
 others in the Firm.  We do not represent this is accurate or complete 
 and we may not update this.  Past performance is not indicative of 
 future returns.  For additional information, research reports and 
 important disclosures, contact me or see 
 https://secure.ms.com/servlet/cls.  You should not use e-mail to 
 request, authorize or effect the purchase or sale of any security or 
 instrument, to send transfer instructions, or to effect any other 
 transactions.  We cannot guarantee that any such requests received via
 
 e-mail will be processed in a timely manner.  This communication is 
 solely for the addressee(s) and may contain confidential information.
 
 We do not waive confidentiality by mistransmission.  Contact me if you
 
 do not wish to receive these communications.  In the UK, this 
 communication is directed in the UK to those persons who are market 
 counterparties or intermediate customers (as defined in the UK
 Financial Services Authority's rules).
 
 
 This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use loop or use apply?

2007-05-17 Thread Adaikalavan Ramasamy

Can you check if the following gives you what you want?

tmp - rbind( A, B )
dis - dist( tmp )
nA  - nrow(A)
nB  - nrow(B)
dis[ 1:nA, nA + 1:nB ] ## output

If it works, this suggestion comes with the caveat that it might be 
computationally inefficient compared with using for() loops for very 
large values of (a,b) or highly discordant values of (a,b). However I am 
hoping the gain from dist() being coded in C should offset it.

Try experimenting to find the optimal speed etc. Also have a look at 
mapply() examples to see if they are useful.

Regards, Adai



Prasenjit Kapat wrote:
 Hi,
 
 I have two matrices, A (axd) and B (bxd). I want to get another matrix C 
 (axb) 
 such that, C[i,j] is the Euclidean distance between the ith row of A and jth 
 row of B. In general, I can say that C[i,j] = some.function (A[i,], B[j,]). 
 What is the best method for doing so? (assume a  b)
 
 I have been doing some exploration myself: Consider the following function: 
 get.f, in which, 'method=1' is the rudimentary double for loop; 'method=2' 
 avoids one loop by constructing a bigger matrix, but doesn't use 
 apply(); 'method=3' avoids both the loops by using apply() and constructing 
 bigger matrices; 'method=4' avoids constructing bigger matrices by using 
 apply() twice.
 
 get.f - function (A, B, method=2) {
   if (method == 1){
   a - nrow(A); b - nrow(B);
   C - matrix(NA, nrow=a, ncol=b);
   for (i in 1:a) 
   for (j in 1:b) 
   C[i,j] - sum((A[i,]-B[j,])^2)
   } else if (method == 2 ) {
   a - nrow(A); b - nrow(B); d - ncol(A);
   C - matrix(NA, nrow=a, ncol=b);
   for (i in 1:a) 
   C[i,] - rowSums((matrix(A[i,], nrow=b, ncol=d, 
 byrow=TRUE) - B) ^ 2)
   } else if (method == 3) {
   C - t(apply(A, MARGIN=1, FUN=FUN1, BB=B)); # 
 transpose is needed
   } else if (method == 4) {
   C - t(apply(A, MARGIN=1, FUN=FUN2, BB=B))
   }
 }
 
 FUN1 - function(aa, BB)
   return(rowSums(
   (matrix(aa, nrow=nrow(BB), ncol=ncol(BB), byrow=TRUE) - BB)^2)
   )
 
 FUN2 - function(aa, BB)
   return(apply(BB, MARGIN=1, FUN=FUN3, aa=aa))
 
 FUN3 - function(bb,aa) return(sum((aa-bb)^2))
 
 ### With these methods and the following intitializations,
 
 a - 100; b - 1000; d - 100; n.loop - 20;
 
 A - matrix(rnorm(a*d), ncol=d)
 B - matrix(rnorm(b*d), ncol=d)
 
 all.times - matrix(0,nrow=5,ncol=4)
 rownames(all.times) - rownames(as.matrix(system.time(NULL)))
 
 for (i in 1:4)  
   for (j in 1:n.loop)
   all.times[,i] - all.times[,i] + 
   as.matrix(system.time(C - get.f(A=A, B=B, 
 method=i)))
 
 all.times - all.times / n.loop
 print(all.times)
 
[,1][,2][,3][,4]
 user.self   4.0554 1.50010 1.50130 4.51285
 sys.self 0.0370 0.02420 0.01800 0.04260
 elapsed4.2705 1.58865 1.59475 6.07535
 user.child 0. 0.0 0.0 0.0
 sys.child   0. 0.0 0.0 0.0
 
 'method=2' stands out be the best and 'method=1' (for loops) beats 'method=4' 
 (two apply()s)... Is that expected?
 
 Is it possible to improve over 'method=2'?
 
 Thanks
 PK
 
 PS: The mail text seems fine in my composer, I hope, it looks decent in your 
 reader.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted least squares

2007-05-08 Thread Adaikalavan Ramasamy

See below.

hadley wickham wrote:
 Dear all,
 
 I'm struggling with weighted least squares, where something that I had
 assumed to be true appears not to be the case.  Take the following
 data set as an example:
 
 df - data.frame(x = runif(100, 0, 100))
 df$y - df$x + 1 + rnorm(100, sd=15)
 
 I had expected that:
 
 summary(lm(y ~ x, data=df, weights=rep(2, 100)))
 summary(lm(y ~ x, data=rbind(df,df)))

You assign weights to different points according to some external 
quality or reliability measure not number of times the data point was 
measured.

Look at the estimates and standard error of the two models below:

  coefficients( summary(f.w - lm(y ~ x, data=df, weights=rep(2, 100))) )
  Estimate Std. Error   t value Pr(|t|)
  (Intercept) 1.940765 3.30348066  0.587491 5.582252e-01
  x   0.982610 0.05893262 16.673448 2.264258e-30

  coefficients( summary( f.u - lm(y ~ x, data=rbind(df,df) ) ) )
  Estimate Std. Errort value Pr(|t|)
  (Intercept) 1.940765 2.32408609  0.8350659 4.046871e-01
  x   0.982610 0.04146066 23.6998165 1.012067e-59

You can see that they have same coefficient estimates but the second one 
  has smaller variances.

The repeated values artificially deflates the variance and thus inflates 
the precision. This is why you cannot treat replicate data as 
independent observations.


 would be equivalent, but they are not.  I suspect the difference is
 how the degrees of freedom is calculated - I had expected it to be
 sum(weights), but seems to be sum(weights  0).  This seems
 unintuitive to me:
 
 summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
 summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
 
 What am I missing?  And what is the usual way to do a linear
 regression when you have aggregated data?

I would be best to use the individual data points instead of aggregated 
data as it allows you to estimate the within-group variations as well.

If you had individual data points, you could try something as follows. 
Please check the codes as I am no expert in the area of repeated measures.

  x  - runif(100, 0, 100)
  y1 - x + rnorm(100, mean=1, sd=15)
  y2 - y1 + rnorm(100, sd=5)

  df - data.frame( y=c(y1, y2),
x=c(x,x),
subject=factor(rep( paste(p, 1:100, sep=), 2 ) ))

  library(nlme)
  summary( lme( y ~ x, random = ~ 1 | subject, data=df ) )

Try reading Pinheiro and Bates (http://tinyurl.com/yvvrr7) or related 
material for more information. Hope this helps.

 Thanks,
 
 Hadley

Regards, Adai

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted least squares

2007-05-08 Thread Adaikalavan Ramasamy

Sorry, you did not explain that your weights correspond to your 
frequency in the original post. I assumed they were repeated 
measurements with within group variation.

I was merely responding to your query why the following differed.
summary(lm(y ~ x, data=df, weights=rep(2, 100)))
summary(lm(y ~ x, data=rbind(df,df)))

Let me also clarify my statement about artificial. If one treats 
repeated observations as independent, then they obtain estimates with 
inflated precision. I was not calling your data artificial in any way.

Using frequency as weights may be valid. Your data points appear to 
arise from discrete distribution, so I am not entirely sure if you can 
use the linear model which assumes the errors are normally distributed.

Regards, Adai



hadley wickham wrote:
 On 5/8/07, Adaikalavan Ramasamy [EMAIL PROTECTED] wrote:
 See below.

 hadley wickham wrote:
  Dear all,
 
  I'm struggling with weighted least squares, where something that I had
  assumed to be true appears not to be the case.  Take the following
  data set as an example:
 
  df - data.frame(x = runif(100, 0, 100))
  df$y - df$x + 1 + rnorm(100, sd=15)
 
  I had expected that:
 
  summary(lm(y ~ x, data=df, weights=rep(2, 100)))
  summary(lm(y ~ x, data=rbind(df,df)))

 You assign weights to different points according to some external
 quality or reliability measure not number of times the data point was
 measured.
 
 That is one type of weighting - but what if I have already aggregated
 data?  That is a perfectly valid type of weighting too.
 
 Look at the estimates and standard error of the two models below:

   coefficients( summary(f.w - lm(y ~ x, data=df, weights=rep(2, 100))) )
   Estimate Std. Error   t value Pr(|t|)
   (Intercept) 1.940765 3.30348066  0.587491 5.582252e-01
   x   0.982610 0.05893262 16.673448 2.264258e-30

   coefficients( summary( f.u - lm(y ~ x, data=rbind(df,df) ) ) )
   Estimate Std. Errort value Pr(|t|)
   (Intercept) 1.940765 2.32408609  0.8350659 4.046871e-01
   x   0.982610 0.04146066 23.6998165 1.012067e-59

 You can see that they have same coefficient estimates but the second one
   has smaller variances.

 The repeated values artificially deflates the variance and thus inflates
 the precision. This is why you cannot treat replicate data as
 independent observations.
 
 Hardly artificially - I have repeated observations.
 
  would be equivalent, but they are not.  I suspect the difference is
  how the degrees of freedom is calculated - I had expected it to be
  sum(weights), but seems to be sum(weights  0).  This seems
  unintuitive to me:
 
  summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
  summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
 
  What am I missing?  And what is the usual way to do a linear
  regression when you have aggregated data?

 I would be best to use the individual data points instead of aggregated
 data as it allows you to estimate the within-group variations as well.
 
 There is no within group variation - these are observations that occur
 with same values many times in the dataset, so have been aggregated
 into the a contingency table-like format.
 
 If you had individual data points, you could try something as follows.
 Please check the codes as I am no expert in the area of repeated 
 measures.

   x  - runif(100, 0, 100)
   y1 - x + rnorm(100, mean=1, sd=15)
   y2 - y1 + rnorm(100, sd=5)

   df - data.frame( y=c(y1, y2),
 x=c(x,x),
 subject=factor(rep( paste(p, 1:100, sep=), 2 ) ))

   library(nlme)
   summary( lme( y ~ x, random = ~ 1 | subject, data=df ) )

 Try reading Pinheiro and Bates (http://tinyurl.com/yvvrr7) or related
 material for more information. Hope this helps.
 
 I'm not interested in a mixed model, and I don't have individual data 
 points.
 
 Hadley
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted least squares

2007-05-08 Thread Adaikalavan Ramasamy

http://en.wikipedia.org/wiki/Weighted_least_squares gives a formulaic 
description of what you have said.

I believe the original poster has converted something like this

y x
0   1.1
0   2.2
0   2.2
0   2.2
1   3.3
1   3.3
2   4.4
 ...

into something like the following

y x freq
0   1.11
0   2.23
1   3.32
2   4.41
 ...

Now, the variance of means of each row in table above is ZERO because 
the individual elements that comprise each row are identical. Therefore 
your method of using inverse-variance will not work here.

Then is it valid then to use lm( y ~ x, weights=freq ) ?

Regards, Adai



S Ellison wrote:
 Hadley,
 
 You asked
 .. what is the usual way to do a linear 
 regression when you have aggregated data?
 
 Least squares generally uses inverse variance weighting. For aggregated data 
 fitted as mean values, you just need the variances for the _means_. 
 
 So if you have individual means x_i and sd's s_i that arise from aggregated 
 data with n_i observations in group i, the natural weighting is by inverse 
 squared standard error of the mean. The appropriate weight for x_i would then 
 be n_i/(s_i^2). In R, that's n/(s^2), as n and s would be vectors with the 
 same length as x. If all the groups had the same variance, or nearly so, s is 
 a scalar; if they have the same number of observations, n is a scalar. 
 
 Of course, if they have the same variance and same number of observations, 
 they all have the same weight and you needn't weight them at all: see 
 previous posting!
 
 Steve E
 
 
 
 ***
 This email and any attachments are confidential. Any use, co...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting a point graph with data in X-axis

2007-05-08 Thread Adaikalavan Ramasamy

R understands only numerical and Date class values for axis. So either


a) plot them using the sequence 1, ..., 32 and then explicitly label 
them. Here is an example:

  n - length(year.month)
  plot( 1:n, freq, xaxt=n)
  mtext( text=year.month, side=1, at=1:n, las=2 )


b) or create the dates in Date format. This option is preferable if the 
dates were varying unequally.

  x - seq( as.Date(2000-05-01), as.Date(2002-12-01), by=1 month )
  plot(x, simulation$freq)


BTW, you could also have created year.month via
   paste( rep( 2000:2002, c(8,12,12) ),
  formatC( c(5:12,1:12,1:12), width=2, flag=0 ) , sep=_ )


Regards, Adai




Milton Cezar Ribeiro wrote:
 Dear all,
 
 I have two data frame, on with a complete list of my field survey with 
 frequency data of a sample species. This data frame looks like:
 
 
 simulation-data.frame(cbind(my.year=c(rep(2000,8),rep(2001,12),rep(2002,12)),my.month=c(5:12,1:12,1:12)))
 simulation$year.month-paste(simulation$my.year,_,ifelse(simulation$my.month=10,simulation$my.month,paste(0,simulation$my.month,sep=)),sep=)
 simulation$freq-sample(1:40,32)
 attach(simulation)
 plot(year.month, freq)
 
 As you can see, I have a collumn with the year and month of my samples, and a 
 freq variable with simulated data. I would like to plot this data but when I 
 try to use the plot showed above, I get a error message. 
 
 After bypass this problem, I would like add points in my graph with simulated 
 data for only a random number of survey month, but I need that the full range 
 of surveys be kept on the X-axis. Just to simulate a sample I am using:
 
 simulation.sample-simulation[sample(1:length(year.month),8, replace=F),]
 simulation.sample$freq-sample(1:40,8)
 
 Any ideas?
 
 Kind regards
 
 Miltinho
 
 __
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data file import - numbers and letters in a matrix(!)

2007-04-12 Thread Adaikalavan Ramasamy

Here is the contents of my testdata.txt :

-
START OF HEIGHT DATA
S= 0y=0.0 x=0.
S= 0 y=0.1 x=0.00055643
  S= 9 y=4.9 x=1.67278117
   S= 9 y=5.0 x=1.74873257
S=10   y=0.0   x=0.
 S=10y=0.1 x=0.00075557
S=99 y=5.3x=1.94719490
END OF HEIGHT DATA
-

If you have access to a shell command, you can try changing the input 
file for read.delim using

cat testdata.txt | grep -v ^START | grep -v ^END | sed 's/ //g' | 
sed 's/S=//' | sed 's/y=/\t/' | sed 's/x=/\t/'

or here is my ugly fix in R

  my.read.file - function(file=file){

   v1 - readLines( con=file, n=-1)
   v2 - v1[ - grep( ^START|^END, v1 ) ]
   v3 - gsub( , , v2)
   v4 - gsub( S=|y=|x=,  , v3 )
   v5 - gsub(^ , , v4)

   m  - t( sapply( strsplit(v5, split= ), as.numeric ) )
   colnames(m) - c(S, y, x )
   return(m)
  }

  my.read.file( testdata.txt )

Regards, Adai




Felix Wave wrote:
 Hello,
 I have a problem with the import of a date file. I seems verry tricky.
 I have a text file (end of the mail). Every file has a different number of 
 measurments 
 witch start with START OF HEIGHT DATA and ende with END OF HEIGHT DATA.
 
 I imported the file in a matrix but the letters before the numbers are my 
 problem 
 (S= ,S=,x=,y=).
 Because through the letters and the space after S= I got a different number
 of columns in my matrix and with letters in my matrix I can't count.
 
 
 My question. Is it possible to import the file to got 3 columns only with 
 numbers and 
 no letters like x=, y=?
 
 Thank's a lot
 Felix
 
 
 
 
 My R Code:
 --
 
 # na.strings = S=
 
 Measure1 - matrix(scan(data.dat, n= 5063 * 4, skip =   20, what = 
 character() ), 5063, 3, byrow = TRUE)
 Measure2 - matrix(scan(data.dat, n= 5063 * 4, skip = 5220, what = 
 character() ), 5063, 3, byrow = TRUE)
 
 
 
 My data file:
 ---
 
 FILEDATE:02.02.2007
 ...
 
 START OF HEIGHT DATA
 S= 0 y=0.0 x=0.
 S= 0 y=0.1 x=0.00055643
 ...
 S= 9 y=4.9 x=1.67278117
 S= 9 y=5.0 x=1.74873257
 S=10 y=0.0 x=0.
 S=10 y=0.1 x=0.00075557
 ...
 S=99 y=5.3 x=1.94719490
 END OF HEIGHT DATA
 ...
 
 START OF HEIGHT DATA
 S= 0 y=0.0 x=0.
 S= 0 y=0.1 x=0.00055643
 
 
 
 The imported matrix: 
   [,1]   [,2]   [,3]   [,4]  
  [6,] S=   9y=4.9x=1.67278117
  [7,] S=   9y=5.0x=1.74873257
  [8,] S=10 y=0.0x=0. S=10
  [9,] y=0.1x=0.00075557 S=10 y=0.2   
 [10,] x=0.00277444 S=10 y=0.3x=0.00605958
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wikibooks

2007-03-30 Thread Adaikalavan Ramasamy

On a related note, one might be interested in checking out citizendium 
which is spin off wikipedia but 1) has more stringent identity 
verification and 2) uses a two-tier system of editors and authors. See 
http://www.citizendium.org/cfa.html.



Deepayan Sarkar wrote:
 On 3/30/07, Sarah Goslee [EMAIL PROTECTED] wrote:
 On 3/30/07, Alberto Monteiro [EMAIL PROTECTED] wrote:
 Deepayan Sarkar wrote:
 I was just looking at this page, and it makes me curious: what gives
 anyone the right to take someone else's mailing list post and include
 that in a Wiki?

 Thinks there were posted to public mailing lists are freely
 copied and distributed. It's a scary thought; I may have posted
 things in 10 or 12 years ago that might cause me problems today,
 but I was pretty aware that I was posting to the whole world.
 
 There's a difference between public archiving and copying.
 
 It's not that simple. Dealing with international contributors it's even 
 worse.
 Under US law (the only one I'm familiar with), the author of a mailing list
 post or any other written work _automatically holds copyright_ to that
 post (although not to the ideas contained therein, but to that particular
 description of the ideas). (Of course, if the ideas are original to the 
 author,
 it's good form to acknowledge that regardless of whether the exact words
 are used).
 
 I believe this is true for all countries that are signatory to the
 Berne convention (which is pretty much all countries [1]). The US in
 fact was one of the later ones to get into it, before which you had to
 explicitly copyright things if you wanted copyright.
 
 -Deepayan
 
 [1] http://upload.wikimedia.org/wikipedia/commons/6/6c/Berne_Convention.png
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wikibooks

2007-03-29 Thread Adaikalavan Ramasamy

I think sometime ago someone suggested that we append a 
comments/discussion/wiki section to the end of every R functions' help 
page that is editable by everyday users.

In other words, every R function help page has a fixed component that 
has met R-core's approval and a clearly marked and more flexible 
components by everyday users.

The comments section on every function could contain suggestions, 
warnings (e.g. the use of c versus as.vector thread that was discussed 
today), examples, do's and don'ts, suggestion for clarification in 
documents.

I think starting from function-level is an interesting idea to 
complement Paul Johnson's R tips.

This comments could perhaps be cleaned up and integrated for future 
releases if the R-core agrees on its usefulness. Think of as a Bayesian 
approach for maintaining information.

Regards, Adai



Frank E Harrell Jr wrote:
 Ben Bolker wrote:
 Alberto Monteiro albmont at centroin.com.br writes:

 As a big fan of Wikipedia, it's frustrating to see how little there is 
 about 
 R in the correlated project, the Wikibooks:

 http://en.wikibooks.org/wiki/R_Programming

 Alberto Monteiro

   Well, we do have an R wiki -- http://wiki.r-project.org/rwiki/doku.php --
 although it is not as active as I'd like.  (We got stuck halfway through
 porting Paul Johnson's R Tips to it ...)   Please contribute!
   Most of the (considerable) effort people expend in answering
 questions about R goes to the mailing lists -- I personally would like it if 
 some
 tiny fraction of that energy could be redirected toward the wiki, where
 information can be presented in a nicer format and (ideally) polished
 over time -- rather than having to dig back through multiple threads on the
 mailing lists to get answers.  (After that we have to get people
 to look for the answers on the wiki.)
 
 I would like to strongly second Ben.  In some ways, R experts are too 
 nice.  Continuing to answer the same questions over and over does not 
 lead to a better way using R wiki.  I would rather see the work go into 
 enhancing the wiki and refactoring information, and responses to many 
 r-help please for help be see wiki topic x.  While doing this let's 
 consider putting a little more burden on new users to look for good 
 answers already provided.
 
 Frank
 
   Just my two cents -- and I've been delinquent in my 
 wiki'ing recently too ...

   Ben Bolker

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vector indexing question

2007-03-29 Thread Adaikalavan Ramasamy

Sounds like you have two different tables and are trying to mine one 
based on the other. Try

ref - data.frame( levels  = 1:25,
ratings = rep(letters[1:5], times=5) )

db - data.frame( vals=101:175, levels=c(1:25, 1:25, 1:25) )

levels.of.interest - ref$levels[ ref$rating==a ]
db$vals[ which(db$levels %in% levels.of.interest) ]

  [1] 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171


OR a much more intuitive way is to merge both tables and proceeding as

out - merge( db, ref, by=levels, all.x=TRUE )
out - out[ order(out$val), ] # little cleanup
subset( out, ratings==a )   # ignore the rownames

levels vals ratings
1   1  101   a
16  6  106   a
31 11  111   a
46 16  116   a
61 21  121   a
3   1  126   a
17  6  131   a
32 11  136   a
47 16  141   a
62 21  146   a
2   1  151   a
18  6  156   a
33 11  161   a
48 16  166   a
63 21  171   a

Then you can do cool things using the apply() family like
   tapply( out$vals, out$ratings, mean )
 a   b   c   d   e
   136 137 138 139 140

Check out %in%, merge and apply.

Regards, Adai



Paul Lynch wrote:
 Suppose you have 4 related vectors:
 
 a.id-c(1:25, 1:25, 1:25)
 a.vals - c(101:175)# same length as a.id (the values for those IDs)
 a.id.levels - c(1:25)
 a.id.ratings - rep(letters[1:5], times=5)# same length as a.id.levels
 
 What I would like to do is specify a rating from a.ratings (e.g. e),
 get the vector of corresponding IDs from a.id.levels (via
 a.id.levels[a.id.ratings=='e']) and then somehow use those IDs in a.id
 to get the corresponding values from a.vals.
 
 I think I can probably write a loop to construct of a vector of
 ratings of the same length as a.id so that the ratings match the ID,
 and then go from there.  Is there a better way?  Perhaps using factors
 or levels or something?
 
 Thanks,
   --Paul


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help for looping

2007-03-27 Thread Adaikalavan Ramasamy

Please try to give a simple reproducible example and simplify your codes 
a bit if you want to get useful responses.

For example, you say your data is a matrix of 1000*30, where I presume 
the matrix has 1000 rows and 30 columns. If so EMP - data[,378:392] 
does not make sense.

Perhaps you might be interested in knn() in the class package.

Regards, Adai




[EMAIL PROTECTED] wrote:
 Rusers:
 
 I have tried to minimize computing times by taking advanage of 
 lapply(). My data is a 1000*30 matrix and the distance matrix was 
 created with dist(). What I am trying to do is to compute the standard 
 distances using the frequencies attached to the nearest negibors of n 
 reference zones. So I will have 1000 standard distances, and would like 
 to see the frequency distribution of the standard distances.
 
 # Convert decimal degrees into UTM miles
 x-(data[,1]-58277.194363)*0.000621
 y-(data[,2]-4414486.03135)*0.000621
 
 # Combine x y for computing distances
 coords-cbind(x,y)
 pts-length(data)
 
 # Subset housing data and employment data
 RES-data[,3:17]
 EMP-data[,378:392]
 
 # Combine all the subdata as D
 D-cbind(coords,RES,EMP)
 
 cases-ncol(D)-ncol(coords)
 
 # Create a threshold bandwidth for defining the nearest neighbors
 thrs-seq(0,35,by=1)
 
 SDTAZ-rep(list(matrix(,nrow(D),length(thrs))),cases)
 
 
 for (j in 1:nrow(D))
 for (k in 1:length(thrs))
 for (l in 1:cases)
 {
 {
 {
 
 SDTAZ[[l]][j,k]-
 sqrt(
sum(
   (D[as.vector(which(dis[j,]=thrs[k])),l+2]-D[j,l+2]-
   min(D[as.vector(which(dis[j,]=thrs[k])),l+2]-D[j,l+2])+1)*
  (
  (dis[j,as.vector(which(dis[j,]=thrs[k]))])^2
  )
   )
 
   /sum(D[as.vector(which(dis[j,]=thrs[k])),l+2]-D[j,l+2]-
  min(D[as.vector(which(dis[j,]=thrs[k])),l+2]-D[j,l+2])+1)
   )
 }
 }
 }
 
 I think that within this nested loop, I should use lapply() but I ended 
 up getting different values I appreciate if someone could kindly 
 help me.
 
 Thank you very much.
 
 Takatsugu Kobayashi
 PhD Candidate
 Indiana University, Dept. Geography
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Listing function displayed as a table

2007-03-26 Thread Adaikalavan Ramasamy

Something ugly like this?

Lst - list()
Lst[[1]] - list(name=Fred, wife=Mary, no.children=3, 
child.ages=c(4,7,9))
Lst[[2]] - list(name=Barney, wife=Liz, no.children=2, 
child.ages=c(3,5))

cbind( do.call(rbind, as.list(Lst))[ ,-4],
child.ages=sapply( Lst, function(myli)
  paste(myli$child.ages, collapse=,) ))


Why don't you just save the data in a dataframe instead of a list to 
begin with ? The only variable I can see that has multiple values is 
child.ages. Or create one row per record as in most databases. The 
choice depends on your input.

  df - rbind( c(Fred, Mary, 4), c(Fred, Mary, 7),
   c(Fred, Mary, 9), c(Barney, Liz, 3),
   c(Barney, Liz, 5) )
  df - data.frame(df)
  colnames(df) - c(Father, Mother, Child.Age)
  df$Child.Age - as.numeric(as.character(df$Child.Age))

  parents - paste( df$Father, df$Mother, sep=+ )

  getstats - function(x) c( values=paste(x, collapse=,),
  mean=round(mean(x),2), youngest=min(x), oldest=max(x) )

  do.call( rbind, tapply( df$Child.Age, parents, getstats ) )

  values  mean   youngest oldest
  Barney+Liz 3,5   43  5
  Fred+Mary  4,7,9 6.67 4  9


Regards, Adai



Schmitt, Corinna wrote:
 Hallo,
 good idea it is working. A new question appears: How can I display the 
 entries in a table like
 
  name   wife  no.children  child.ages
 FredMary3   4,7,9
 Barney  Liz 2   3,5
 
 Thanks, Corinna
 
 
 -Ursprüngliche Nachricht-
 Von: Michael T. Mader [mailto:[EMAIL PROTECTED] 
 Gesendet: Montag, 26. März 2007 15:32
 An: Schmitt, Corinna; r-help@stat.math.ethz.ch
 Betreff: Re: [R] Listing function
 
 Lst - list()
 Lst[[1]] - list(name=Fred, wife=Mary, no.children=3, 
 cild.ages=c(4,7,9))
 Lst[[2]] - list(name=Barney, wife=Liz, no.children=2, cild.ages=c(3,5))
 
 I.e. a list of lists
 
 Regards
 
 Michael
 
 Schmitt, Corinna wrote:
 Hallo,

 I build a list by the following way:

 Lst = list(name=Fred, wife=Mary, no.children=3, cild.ages=c(4,7,9))

 I know how I can extract the information one by one. But now I want to
 add a new entry which looks like

 name=Barney, wife=Liz, no.children=2, cild.ages=c(3,5)

 How can I add this information to Lst without overwriting the first
 entry?
 How can I then extract the corresponding information if I have both
 entries in Lst?

 Thanks for helping,

 Corinna

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Select the last two rows by id group

2007-03-20 Thread Adaikalavan Ramasamy

Here is yet another solution. This one uses by() which generates nice 
visual output.

score - data.frame(
  id  = c('001','001','001','002','003','003'),
  math= c(80,75,70,65,65,70),
  reading = c(65,70,88,NA,90,NA)
)

out - by( score, score$id, tail, n=2 )
# score$id: 001
#id math reading
# 2 001   75  70
# 3 001   70  88
# 
# score$id: 002
#id math reading
# 4 002   65  NA
# 
# score$id: 003
#id math reading
# 5 003   65  90
# 6 003   70  NA


And if you want to put it back into a data frame, use

do.call( rbind, as.list(out) )
#id math reading
# 001.2 001   75  70
# 001.3 001   70  88
# 002   002   65  NA
# 003.5 003   65  90
# 003.6 003   70  NA

Ignore the rownames here.

HTH, Adai


Lauri Nikkinen wrote:
 Hi R-users,
 
 Following this post http://tolstoy.newcastle.edu.au/R/help/06/06/28965.html ,
 how do I get last two rows (or six or ten) by id group out of the data
 frame? Here the example gives just the last row.
 
 Sincere thanks,
 Lauri
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] run a script during R CMD build

2007-03-20 Thread Adaikalavan Ramasamy

Yes, one way is to use commandArgs in the R script. So say your R script 
is as follows

  n   - as.character(commandArgs()[3])
  fn  - as.character(commandArgs()[4])

  mat - matrix( rnorm( n*n ), nc=n )
  write.table( mat, filenames=fn, sep=\t, quote=FALSE )



Then you execute the commands from command line as

   R --no-save  script 100 out.txt


This will run the R commands and output them to out.txt.



johan Faux wrote:
 I would like R CMD build to run some R code which does some stuff and save 
 the result as a file in /inst/docs folder. 
 Is there any way of doing this.
 
 Thank you.
 Johan
 
 
 
 
  
 
 We won't tell. Get more on shows you hate to love 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] run a script during R CMD build

2007-03-20 Thread Adaikalavan Ramasamy

Sorry, I did not read the question properly.

I believe all your functions goes in mypkg/R and your data goes into 
mypkg/data subdirectory respectively but I am no expert in this area.

If you want to reflect your data from one folder to another, you can try 
using a symbolic or soft link in *nix systems
ln -s /inst/mydata.Rdata /somewhere/mypkg/data . Not sure if it the 
symbolic link approach will work when you try to R CMD BUILD mypkg.

You might be interested in the examples in package.skeleton().

Regards, Adai



johan Faux wrote:
 Thanks for your help.
 Maybe I was not clear in my question. 
 Let say I have a R script , myscript.R which produce some file mydata.Rdata 
 and saves them in /inst folder.
 My question is where to I put my script so that it will run when I build the 
 package using R CMD build ? 
 I want to include mydata.RData in my package and I want it to be updated 
 every time i build the package.
 
 I appreciate your help anyway.
 
 
 -Johan
 
 - Original Message 
 From: Adaikalavan Ramasamy [EMAIL PROTECTED]
 To: johan Faux [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch
 Sent: Tuesday, March 20, 2007 12:10:21 PM
 Subject: Re: [R] run a script during R CMD build
 
 Yes, one way is to use commandArgs in the R script. So say your R script 
 is as follows
 
   n   - as.character(commandArgs()[3])
   fn  - as.character(commandArgs()[4])
 
   mat - matrix( rnorm( n*n ), nc=n )
   write.table( mat, filenames=fn, sep=\t, quote=FALSE )
 
 
 
 Then you execute the commands from command line as
 
R --no-save  script 100 out.txt
 
 
 This will run the R commands and output them to out.txt.
 
 
 
 johan Faux wrote:
 I would like R CMD build to run some R code which does some stuff and save 
 the result as a file in /inst/docs folder. 
 Is there any way of doing this.

 Thank you.
 Johan




  
 
 We won't tell. Get more on shows you hate to love 


 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 
 
 
 
 
 
 
 
  
 
 Don't pick lemons.
 See all the new 2007 cars at Yahoo! Autos.
 http://autos.yahoo.com/new_cars.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] change the name of file

2006-07-24 Thread Adaikalavan Ramasamy

Do you mean write.table instead of Write() ? Try 

 fn - paste(Data_, i, .txt, sep=)
 write.table( t(x), file=fn, sep=\t )

Regards, Adai


On Mon, 2006-07-24 at 11:06 +0200, Robert Mcfadden wrote:
 Dear R Users,
 Is it possible to make file names dependent on a changing variable?
 For instance. I generate random numbers in a loop and at each iteration I
 want data to write to file (I do not want to write everything in one file
 using 'append'):
 
 for (i in 1:50){
 x-matrix(runif(100, min=0,max=1),nrow=5,ncol=20)
 Write(t(x),file=Data_i.txt,ncolumns=5,sep=\t) 
 }   
 
 Of course file name Data_i.txt will be the same for changing i,
 unfortunately. 
 
 Any suggestion would be appreciate
 Robert
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] inplace assignment

2006-06-16 Thread Adaikalavan Ramasamy

I do not fully understand your question but how about :

 inplace - function( df, cond1, cond2, cols, suffix ){
 
  w  - which( cond1  cond2 )
  df - df[ w, cols ]
  paste(df, suffix)
  return(df)
 }


BTW, did you mean colnames(df) - paste(colnames(df), suffix) instead
of paste(df, suffix) ?

Regards, Adai



On Fri, 2006-06-16 at 10:23 +0100, David Hugh-Jones wrote:
 I get tired of writing, e.g.
 
 
 data.frame[some.condition  another.condition, big.list.of.columns] -
 paste(data.frame[some.condition  another.condition,
 big.list.of.columns], foobar)
 
 
 I would a function like:
 
 inplace(paste(data.frame[some.condition  another.condition,
 big.list.of.columns], foobar))
 
 which would take the first argument of the inner function and assign
 the function's result to it.
 
 Has anyone done something like this? Are there simple alternative
 solutions that I'm missing?
 
 Cheers
 David
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] data managment

2006-06-15 Thread Adaikalavan Ramasamy

If your df contains your data, try

 tmp - cbind( paste(df[ ,1], df[ ,2], sep=:), 
   paste(df[ ,3], df[ ,4], sep=:) )
 tmp - t( apply(tmp, 1, sort) )

 out - data.frame( do.call(rbind, strsplit( tmp[,1], split=: )), 
do.call(rbind, strsplit( tmp[,2], split=: )) )
 colnames(out) - colnames(df)
 out

Regards, Adai



On Wed, 2006-06-14 at 16:35 +0100, yohannes alazar wrote:
 First I would really like to thank the mailing list for help I got in the
 past, as a new to R I am really needing some support on hoe to code the
 following problem.
 
 
 
 I am trying to sort some data I have in a big file. The file has 4 columns
 and 19000 rows. An example of it looks like this:-
 
 
 
 G 0.892   A 0.108
 
 G 0.883   T  0.117
 
 T  0.5   C 0.5
 
 A 0.617   G 0.383
 
 G 0.925   A 0.075
 
 A 0.967   G 0.033
 
 C 0.883   T  0.117
 
 C 0.633   T  0.367
 
 G 0.95 A 0.05
 
 C 0.742   G 0.258
 
 G 0.875   T  0.125
 
 T  0.167   C 0.833
 
 C 0.792   A 0.208
 
 
 
 Columns one and three are alphabets while three and four are their
 corresponding values.
 
 I wanted to sort this data so that my first and third columns are in
 alphabetic order. For example in the first row the order is G then A.
 This is not in alphabetic order therefore we swap them along with their
 values and it becomes:
 
  A0.108   G 0.892
 
 Row two looks fine but row three needs the same rearrangement as row one.
 And the final out put looks like:
 
 A 0.108   G 0.892
 
 G 0.883   T  0.117
 
 C 0.5   T  0.5
 
 A 0.617   G 0.383
 
 A 0.075   G 0.925
 
 A 0.967   G 0.033
 
 C 0.883   T  0.117
 
 C 0.633   T  0.367
 
 A 0.05 G 0.95
 
 C 0.742   G 0.258
 
 G 0.875   T  0.125
 
 C 0.833   T  0.167
 
 A 0.208   C 0.792
 
 Please some help with the relevant command names or a technique to code this
 task.
 
 Thank you in advance
 
 Regards Hannes
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] running R in batch with stdin input

2006-06-15 Thread Adaikalavan Ramasamy

?commandArgs



On Thu, 2006-06-15 at 16:05 -0700, Eric Hu wrote:
 Hi I have a R script that needs to run a few times for different
 systems. I use R --no-save  r.script for one system. I am trying with
 no luck to use R CMD BATCH to introduce an stdin input variable for
 the script. I wonder if anyone can provide the correct usage to put
 the variable in the command like R CMD BATCH r.script name_variable.
 
 Thanks.
 
 -Eric
 
 In the r.script I have
 
 name - readline(/dev/stdin)
 r0 - read.table(/usr/local/surface/$name/$name_c_r)
 ...
 
 I want to get at the end:
 
 name - 1BRS
 r0 - read.table(/usr/local/surface/1BRS/1BRS_c_r)
 ...
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] write data from function into external table

2006-06-14 Thread Adaikalavan Ramasamy

What is your desired output ? This will clarify the problem greatly.

Perhaps, this might be of some use :

 f - function(v, pos, val=100){  v[pos] - val; return(v)  }

 test - 1:3
 test - f(test, 1)
 test
 [1] 100  2  3

Regards, ADai



On Wed, 2006-06-14 at 12:41 +0200, Sebastian Leuzinger wrote:
 Dear list,
 My apologies if a solution / explanation to this already exists on the list, 
 but it is difficult to assign it to a certain keyword.
 
 test-c(1:3)
 testfct - function(x) {test[1]-100}
  test
 [1] 1 2 3
  testfct(1)
 [1] 1 2 3
 
 Basically, I would like to write data into an external table that the 
 function 
 does not know. Why is this not working / what alternatives exist?
 
 Thanks, Sebastian 
 
 
 Sebastian Leuzinger
 University of Basel, Department of Environmental Science
 Institute of Botany
 Schönbeinstr. 6 CH-4056 Basel
 ph0041 (0) 61 2673511
 fax   0041 (0) 61 2673504
 email [EMAIL PROTECTED] 
 web   http://pages.unibas.ch/botschoen/leuzinger
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] merge dataframes with conditions formulated as logical expressions

2006-06-14 Thread Adaikalavan Ramasamy

You have discontinuity between your MIN.VAL and MAX.VAL for a given
group. If this is true in practise, then you may want to check and
report when VAL is in the discontinuous region.

Here is my solution that ignores that (and only uses MIN.VAL and
completely disrespecting MAX.VAL). Not very elegant but should do the
trick.


 df - data.frame( GRP=c( A, A, B ), VAL=c( 10, 100, 200 ) )
 dp - data.frame( GRP=c( A, A, B, B ), MIN.VAL=c( 1, 50, 1,
70 ), MAX.VAL=c( 49, 999, 59, 999 ),  VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )

 x - split(df, df$GRP)
 y - split(dp, dp$GRP)

 out - NULL
 for(g in names(x)){

   xx - x[[g]]
   yy - y[[g]]

   w   - cut(xx$VAL, breaks=c(yy$MIN.VAL, Inf), labels=F)
   tmp - cbind(xx, yy[w, VAL2])
   colnames(tmp) - c(GRP, VAL, VAL2)
   out - rbind(out, tmp)
 } 
 out

Regards, Adai



On Wed, 2006-06-14 at 16:55 +0200, Wolfram Fischer wrote:
 I have a data.frame df containing two variables:
 GRP: Factor
 VAL: num
 
 I have a data.frame dp containing:
 GRP: Factor
 MIN.VAL: num
 MAX.VAL: num
 VAL2: num
 with several rows per GRP
 where dp[i-1, MAX.VAL]  dp[i, MIN.VAL]
 within the same GRP.
 
 I want to create df[i, VAL2] - dpp[z, VAL2] 
 withi along df 
 and dpp - subset( dp, GRP = df[i, GRP] )
 so that it is true for each i:
 df[i, VAL]  dpp[z, MIN.VAL]
and  df[i, VAL] = dpp[z, MAX.VAL]
 
 Is there an easy/efficient way to do that?
 
 Example:
 df - data.frame( GRP=c( A, A, B ), VAL=c( 10, 100, 200 ) )
 dp - data.frame( GRP=c( A, A, B, B ),
 MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), 
 VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )
 
 The result should be:
 df$VAL2 - c( 1.1, 2.2, 4.4 )
 
 Thanks - Wolfram
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] www.r-project.org

2006-04-26 Thread Adaikalavan Ramasamy

I am coming late into the discussion, so apologies if the following
points are redundant.


1) IMHO, the most important feature that would make life a lot easier
for everyone is having search engines on the main webpage. I know you
can click on the Search on the left hand side pane but putting it on
the main webpage is much more useful.

We can also have a targets section for the search (c.f.
http://finzi.psych.upenn.edu/nmz.html) where one can search mailing
list, html Manual, FAQ, user-inputted package name etc.


2) About having explicit URL print, may I suggest using
http://maps.google.com approach of using the Link to this page (top
right hand of the page) ?


3) I understand that R is restricted in terms of priority and human
resources. But given that Asia (e.g. India, Singapore, China) has low
labour costs and abundant computing personals, would it not make sense
for some Asian research group to offer to spearhead and maintain the
website ?

From a marketing point of view some nice graphics, search functions and
navigation etc would be useful to attract newcomers. There could be a
simple version alternative (as it is now) for those who prefer or
those who have trouble accessing the site.

Just my £0.02.

Regards, Adai



On Tue, 2006-04-25 at 12:33 -0700, Spencer Graves wrote:
 Hi, Gabor:  inline
 Gabor Grothendieck wrote:
 
  On Windows, right click the web page, choose Properties and
  copy the url there.
 
 That works, and I will use it in the future.  Thanks.
 
 However, if the subject is not educating Spencer Graves but how to 
 make www.r-project.org more user friendly, then it still might help to 
 display as Address the actual web address of the archive page rather 
 than www.r-project.org.  It may not look as pretty, but I'm for 
 function first and cosmetics only if they don't interfere with 
 functionality.
 
 Best Wishes,
 spencer graves
  
  On 4/25/06, Spencer Graves [EMAIL PROTECTED] wrote:

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] choosing a particular object

2006-03-31 Thread Adaikalavan Ramasamy

Try

 test.fn - function(obj.name, var.name=q2){

  stopifnot( is.character(obj.name)  is.character(var.name) )
  x - subset( get(obj), select=var.name )
  table(x)
 }



On Fri, 2006-03-31 at 12:44 +0300, Adrian DUSA wrote:
 Hello all,
 
 I'd like to create a function which would do some analysis on a particular 
 object, which should be specified in advance. Something like:
 
  ls()
 [1] aa bb cc
 
 Object - bb
 var.name - q2
 testfunction - function(obj.name, var.name) {
   temp - give.me.the.object.called(Object)
   table(temp[, var.name])
 }
 
 This should perfom the same thing as:
 table(bb$q2)
 
 Is this possible?
 TIA,
 Adrian


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] ROC optimal threshold

2006-03-31 Thread Adaikalavan Ramasamy

If you define a cost function for a given threshold k as

   cost(k) = FP(k) + lambda * FN(k)

then choose k that minimises cost. FP and FN are false positives and
false negatives at threshold k. 

You change lambda to a value greater than 1 if you want to penalise FN
more than FP. There are many situations where this is desirable. For
example when you have highly unbalanced class sizes. For example
consider a problem where you want to predict rare events and you will be
penalised much more heavily if you miss an event than a non-event.


I believe the ROC was designed to compare two methods over a range of
thresholds and not for choosing the threshold itself.

Regards, Adai



On Fri, 2006-03-31 at 08:01 -0500, Tim Howard wrote:
 Jose - 
 
 I've struggled a bit with the same question, said another way: how do you 
 find the value in a ROC curve that minimizes false positives while maximizing 
 true positives?
 
 Here's something I've come up with. I'd be curious to hear from the list 
 whether anyone thinks this code might get stuck in local minima, or if it 
 does find the global minimum each time. (I think it's ok).
 
 From your ROC object you need to grab the sensitivity (=true positive rate) 
 and specificity (= 1- false positive rate) and the cutoff levels.  Then find 
 the value that minimizes abs(sensitivity-specificity), or  
 sqrt((1-sens)^2)+(1-spec)^2)) as follows:
 
 absMin - extract[which.min(abs(extract$sens-extract$spec)),];
 sqrtMin - extract[which.min(sqrt((1-extract$sens)^2+(1-extract$spec)^2)),];
 
 In this example, 'extract' is a dataframe containing three columns: 
 extract$sens = sensitivity values, extract$spec = specificity values, 
 extract$votes = cutoff values. The command subsets the dataframe to a single 
 row containing the desired cutoff and the sens and spec values that are 
 associated with it.
 
 Most of the time these two answers (abs or sqrt) are the same, sometimes they 
 differ quite a bit. 
 
 I do not see this application of ROC curves very often. A question for those 
 much more knowledgeable than I is there a problem with using ROC curves 
 in this manner?
 
 Tim Howard
 
 
 
 
 Date: Fri, 31 Mar 2006 11:58:14 +0200
 From: Anadon Herrera, Jose Daniel [EMAIL PROTECTED]
 Subject: [R] ROC optimal threshold
 To: 'r-help@stat.math.ethz.ch' r-help@stat.math.ethz.ch
 Message-ID:
   [EMAIL PROTECTED]
 Content-Type: text/plain; charset=iso-8859-1
 
 hello,
 
 I am using the ROC package to evaluate predictive models
 I have successfully plot the ROC curve, however
 
 ?is there anyway to obtain the value of operating point=optimal threshold
 value (i.e. the nearest point of the curve to the top-left corner of the
 axes)?
 
 thank you very much,
 
 
 jose daniel anadon
 area de ecologia
 universidad miguel hernandez
 
 espa?a
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Plotting a segmented function

2006-03-30 Thread Adaikalavan Ramasamy

Try 
 
   f - function(x){
 if(x = 0) return(0)
 if( 0  x  x = 1 ) return( 0.5*x^2 )
 if( 1  x  x = 2 ) return( -0.5*x^2 + 2*x - 1 )
 return(1)
   }

   xx - seq(-1, 3, 0.1)
   yy - sapply(xx, f)

Regards, Adai


On Thu, 2006-03-30 at 09:25 -0200, Ken Knoblauch wrote:
 You could try nested ifelse statements,
 
 something like (untested)
 
 x - seq(-1, 3, 0.1)
  y - ifelse( x = 3,
   ifelse( x = 2,
   ifelse( x = 1,
   ifelse( x = 0, 0, x^2/2), 2 * x - (x^2/2) -1),  1) )
 plot(x, y)
 
 **
 This might be a trivial question, but I would appreciate if anybody
 could suggest an elegant way of plotting a function such as the
 following (a simple distribution function):
 F(x) = 0 if x=0
=(x^2)/2 if 0x=1
=2x-((x^2)/2)-1 if 1x=2
=1 if x2
 This is just an example. In this case it is a continuous function. But
 how to do it in general in an elegant way.
 I've done the following:
 x1 - seq(-1,0,.01)
 f1 - rep(0,101)
 x2 - seq(0,1,.01)
 f2 - 0.5*(x2^2)
 x3 - seq(1,2,.01)
 f3 - (2*x3)-(0.5*(x3^2))-1
 x4 - seq(2,3,.01)
 f4 - rep(1,101)
 x - c(x1,x2,x3,x4)
 F - c(f1,f2,f3,f4)
 plot(x,F,type='l')
 
 But this seems very cumbersome.
 Any help is much appreciated.
 
 Thanks
 Jacob
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] which function to use to do classification

2006-03-30 Thread Adaikalavan Ramasamy

I find it helpful to explain to my colleagues from non-mathematical
background that in classification the classes are predefined and in
clustering the classes (and sometimes the number of classes) are not.

I prefer the use of the term class discovery over clustering when
people try to cluster samples in order to derive meaningful classes.

Regards, Adai



On Wed, 2006-03-29 at 18:52 -0500, Liaw, Andy wrote:
 In addition to Brian's comment, Gordon's book, already in 2nd edition, is
 all about clustering, but the title is simply `Classification'.
 
 Andy
 
 From: Sean Davis
  
  We have to be careful here.  Classification (which is the 
  terminology that the original poster used) is NOT the same as 
  clustering, although the two are often confused.  If the 
  original poster wants to do clustering and examine the 
  results for the presence of three clusters, that is fine and 
  there are many methods for clustering that could be used.  
  However, classification will require a different set of 
  tools.  If the clustering tools already pointed out are not 
  doing what is needed (that is, that Cao actually is 
  interested in clustering and not classification), then 
  perhaps a further explanation of what the problem would help clarify.
  
  Sean
  
  
  On 3/29/06 1:46 AM, Jacques VESLOT [EMAIL PROTECTED] wrote:
  
   try this (suppose mat is your matrix):
   
   hc - hclust(dist(mat,manhattan), ward)
   plot(hc, hang=-1)
   (x - identify(hc)) # rightclick to stop
   cutree(hc, 3)
   
   km- kmeans(mat, 3)
   km$cluster
   km$centers
   
   pam(daisy(mat, metric = manhattan), k=3, diss=T)$clust
   
   
   
   Baoqiang Cao a écrit :
   
   Thanks!
   I tried kmeans, the results is not very positive. Anyway, thanks 
   Jacques! Please let me know if you have any other thoughts!
   
   Best regards, 
  Baoqiang Cao
   
   === At 2006-03-29, 00:08:44 you wrote: ===
   

   
   if you want to classify rows or columns, read:
   ?hclust
   ?kmeans
   library(cluster)
   ?pam
   
   
   Baoqiang Cao a écrit :
   
  
   
   Dear All,
   
   I have a data, suppose it is an N*M matrix data. All I 
  want is to 
   classify it into, let see, 3 classes. Which method(s) do 
  you think 
   is(are) appropriate for this purpose? Any reference will be 
   welcome! Thanks!
   
   Best,
   Baoqiang Cao
   
   
   
   
  ---
   -
   
   __
   R-help@stat.math.ethz.ch mailing list 
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide! 
   http://www.R-project.org/posting-guide.html
   

   
   .
  
   
   
   = = = = = = = = = = = = = = = = = = = =
   
   Baoqiang Cao
   [EMAIL PROTECTED]
   2006-03-29
   
   

   
   
   __
   R-help@stat.math.ethz.ch mailing list 
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide! 
   http://www.R-project.org/posting-guide.html
  
  __
  R-help@stat.math.ethz.ch mailing list 
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
  
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Binning question (binning rows of a data.frame according to a variable)

2006-03-20 Thread Adaikalavan Ramasamy

Are you saying that your data might look like this ?

 set.seed(1)  # For reproducibility only - remove this
 mydf - data.frame( age=round(runif(100, min=5, max=65), digits=1),
 nred=rpois(100, lambda=10), 
 nblue=rpois(100, lambda=5), 
 ngreen=rpois(100, lambda=15) )
 mydf$total - rowSums( mydf[ , c(nred, nblue, ngreen)] )

 head(mydf)
age nred nblue ngreen total
 1 20.9   11 7 1533
 2 27.38 2 1828
 3 39.4   11 4  823
 4 59.56 5  819
 5 17.1   10 3 1629
 6 58.9   11 5 1430


If so, then try this :

 mydf  - mydf[order(mydf$age), ]  ## re-order by age
 mydf$cumtotal - cumsum(mydf$total)   ## cummulative total

 brk.pts   - seq(from=0, to=sum(mydf$total), len=9)
 mydf$grp  - cut( mydf$cumtotal , brk.pts, labels=F )

 age nred nblue ngreen total cumtotal grp
 27  5.89 5  822   22   1
 47  6.46 5 1324   46   1
 92  8.58 4 1830   76   1
 10  8.7   12 5  825  101   1
 55  9.2   10 7 1330  131   1
 69 10.19 3 1830  161   1


So here your 'grp' column is what you really want. Just to check 

 tapply( mydf$total, mydf$grp, sum )
   1   2   3   4   5   6   7   8 
 352 363 372 387 358 377 377 370 

 sapply( tapply( mydf$age, mydf$grp, range ), c )
 12345678
 [1,]  5.8 17.1 24.5 29.0 34.6 44.6 51.2 56.7
 [2,] 16.2 24.0 28.4 33.9 44.1 51.0 55.4 64.5

The last command says that your youngest student in group 1 is aged 5.8
and oldest is aged 16.2.


Taking this one step further, you can calculate the proportion of the
red, green and blue for each of the 8 groups.

 props - mydf[ , c(nred, nblue, ngreen)]/mydf$total # proportions
 apply( props, 2, function(v) tapply( v, mydf$grp, mean ) )
nred nbluengreen
 1 0.3459898 0.1776441 0.4763661
 2 0.3280712 0.1730796 0.4988492
 3 0.3061429 0.1748149 0.5190422
 4 0.3759380 0.2084694 0.4155926
 5 0.3548805 0.1587353 0.4863842
 6 0.3106835 0.1829349 0.5063816
 7 0.3525933 0.1599737 0.4874330
 8 0.3133796 0.1795567 0.5070637

Hope this of some use.

Regards, Adai



On Sun, 2006-03-19 at 18:58 +, Dan Bolser wrote:
 Adaikalavan Ramasamy wrote:
  Do you by any chance want to sample from each group equally to get an
  equal representation matrix ? 
 
 No.
 
 I want to make groups of equal sizes, where size isn't simply number of 
 rows (allowing a simple 'gl'), but a sum of the variable.
 
 Thanks for the code though, it looks useful.
 
 
 
 Here is an analogy for what I want to do (in case it helps).
 
 A group of students have some bags of marbles - The marbles have 
 different colours. Each student has one bag, but can have between 5 and 
 50 marbles per bag with any given strange distribution you like. I line 
 the students up by age, and want to see if there is any systematic 
 difference between the number of each color of marble by age (older 
 students may find primary colours less 'cool').
 
 Because the statistics of each individual student are bad (like the 
 proportion of each color per student -- has a high variance) I first put 
 all the students into 8 groups (for example).
 
 Thing is, for one reason or another, the number of marbles per bag may 
 systematically vary with age too. However, I am not interested in the 
 number of marbles per bag, so I would like to group the students into 8 
 groups such that each group has the same total number of marbles. (Each 
 group having a different sized age range, none the less ordered by age).
 
 Then I can look at the proportion (or count) of colours in each group, 
 and I can compare the groups or any trend accross the groups.
 
 Does that make sense?
 
 Cheers,
 Dan.
 
 
 
 
 
 
  Here is an example of the input :
  
   mydf - data.frame( value=1:100, value2=rnorm(100),
   grp=rep( LETTERS[1:4], c(35, 15, 30, 20) ) )
  
  which has 35 observations from A, 15 from B, 30 from C and 20 from D.
  
  
  And here is a function that I wrote:
  
   sample.by.group - function(df, grp, k, replace=FALSE){
  
 if(length(k)==1){ k - rep(k, length(unique(grp))) }
  
 if(!replace  any(k  table(grp)))
   stop( paste(Cannot take a sample larger than the population when
   'replace = FALSE'.\n, Please specify a value greater than,
   min(table(grp)), or use 'replace = TRUE'.\n) )
  

 ind   - model.matrix( ~ -1 + grp )
 w.mat - list(NULL)
 
 for(i in 1:ncol(ind)){
   w.mat[[i]] - sample( which( ind[,i]==1 ), k[i], replace=replace )
 }

 out - df[ unlist(w.mat), ]
 return(out)
   }
  
  
  And here are some examples of how to use it :
   
  mydf - mydf[ sample(1:nrow(mydf)), ]   # scramble it for fun
  
  
  out1 - sample.by.group(mydf, mydf$grp, k=10 )
  table( out1$grp )
  
   out2 - sample.by.group(mydf, mydf$grp, k=50, replace

Re: [R] hist-data without plot

2006-03-20 Thread Adaikalavan Ramasamy

hist(data, plot=FALSE)$counts


On Mon, 2006-03-20 at 14:23 +0100, Gottfried Gruber wrote:
 hello,
 
 i need the data from hist() but i do not want the plot.
 e.g.
 z=hist(data)$counts  #returns absolute  frequency
 
 but when i execute this command the plot occurs also. is it possible to 
 suppress the plot?
 
 many thanks,
 best regards gg

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Binning question (binning rows of a data.frame according to a variable)

2006-03-20 Thread Adaikalavan Ramasamy

Lets say there are 10 students in the first group and denote x1 as (say)
the number of red balls for student 1 and s1 the total balls. Then I was
calculating the average the proportion ( x1/s1 + x2/s2 + ... + x10/s10 )
and you were calculating the average number of events (x1+x2
+...+x10)/(s1+s2+...+s10).

On second thoughts I think it is much better to calculate the a weighted
average of the proportions. The weights should reflect the variance of
the estimate of the proportions.



( w1*x1/s1 + w2*x2/s2 + ... + w10*x10/s10 )



On Mon, 2006-03-20 at 15:11 +, Dan Bolser wrote:
 Adaikalavan Ramasamy wrote:
  Are you saying that your data might look like this ?
  
   set.seed(1)  # For reproducibility only - remove this
   mydf - data.frame( age=round(runif(100, min=5, max=65), digits=1),
   nred=rpois(100, lambda=10), 
   nblue=rpois(100, lambda=5), 
   ngreen=rpois(100, lambda=15) )
   mydf$total - rowSums( mydf[ , c(nred, nblue, ngreen)] )
  
   head(mydf)
  age nred nblue ngreen total
   1 20.9   11 7 1533
   2 27.38 2 1828
   3 39.4   11 4  823
   4 59.56 5  819
   5 17.1   10 3 1629
   6 58.9   11 5 1430
  
  
  If so, then try this :
  
   mydf  - mydf[order(mydf$age), ]  ## re-order by age
   mydf$cumtotal - cumsum(mydf$total)   ## cummulative total
  
   brk.pts   - seq(from=0, to=sum(mydf$total), len=9)
   mydf$grp  - cut( mydf$cumtotal , brk.pts, labels=F )
  
   age nred nblue ngreen total cumtotal grp
   27  5.89 5  822   22   1
   47  6.46 5 1324   46   1
   92  8.58 4 1830   76   1
   10  8.7   12 5  825  101   1
   55  9.2   10 7 1330  131   1
   69 10.19 3 1830  161   1
  
  
  So here your 'grp' column is what you really want. Just to check 
  
   tapply( mydf$total, mydf$grp, sum )
 1   2   3   4   5   6   7   8 
   352 363 372 387 358 377 377 370 
  
   sapply( tapply( mydf$age, mydf$grp, range ), c )
   12345678
   [1,]  5.8 17.1 24.5 29.0 34.6 44.6 51.2 56.7
   [2,] 16.2 24.0 28.4 33.9 44.1 51.0 55.4 64.5
  
  The last command says that your youngest student in group 1 is aged 5.8
  and oldest is aged 16.2.
  
  
  Taking this one step further, you can calculate the proportion of the
  red, green and blue for each of the 8 groups.
  
   props - mydf[ , c(nred, nblue, ngreen)]/mydf$total # proportions
   apply( props, 2, function(v) tapply( v, mydf$grp, mean ) )
  nred nbluengreen
   1 0.3459898 0.1776441 0.4763661
   2 0.3280712 0.1730796 0.4988492
   3 0.3061429 0.1748149 0.5190422
   4 0.3759380 0.2084694 0.4155926
   5 0.3548805 0.1587353 0.4863842
   6 0.3106835 0.1829349 0.5063816
   7 0.3525933 0.1599737 0.4874330
   8 0.3133796 0.1795567 0.5070637
  
  Hope this of some use.
 
 Yes, this is very useful! I have just one remaining question, above you 
 take the mean of the group proportion...
 
 apply( props, 2, function(v) tapply( v, mydf$grp, mean ) )
 
 
 instead of explicitly recalculating the proportion for the group (what I 
 couldn't script real good) ...
 
 rbind(
colSums(mydf[ mydf$grp==1, c(nred, nblue, ngreen)])/
   sum (mydf[ mydf$grp==1, c(nred, nblue, ngreen)]),
...
colSums(mydf[ mydf$grp==8, c(nred, nblue, ngreen)])/
   sum (mydf[ mydf$grp==8, c(nred, nblue, ngreen)])
   )
 
 
 Giving (from the same seed)...
 
nred nbluengreen
 [1,] 0.3465909 0.1704545 0.4829545
 [2,] 0.3250689 0.1735537 0.5013774
 [3,] 0.3064516 0.1774194 0.5161290
 [4,] 0.3746770 0.2067183 0.4186047
 [5,] 0.3519553 0.1564246 0.4916201
 [6,] 0.3103448 0.1830239 0.5066313
 [7,] 0.3501326 0.1644562 0.4854111
 [8,] 0.3081081 0.1837838 0.5081081
 
 
 Which is *slightly* different from the 'mean' approach.
 
   round(former-latter,4)
   nred   nblue  ngreen
 1 -0.0006  0.0072 -0.0066
 2  0.0030 -0.0005 -0.0025
 3 -0.0003 -0.0026  0.0029
 4  0.0013  0.0018 -0.0030
 5  0.0029  0.0023 -0.0052
 6  0.0003 -0.0001 -0.0002
 7  0.0025 -0.0045  0.0020
 8  0.0053 -0.0042 -0.0010
 
 
 I know this less a question about R, and more a question about general 
 stats, but why did you choose the former and not the latter method? Is 
 one wrong and one right? Or did the former better fit the situation as 
 described?
 
 Thanks for any insight into your decision, as this is something that has 
 always puzzled me.
 
 Thanks for the beautifully clear examples!
 
 
 Dan.
 
  
  Regards, Adai
  
  
  
  On Sun, 2006-03-19 at 18:58 +, Dan Bolser wrote:
  
 Adaikalavan Ramasamy wrote:
 
 Do you by any chance want to sample from each group equally to get an
 equal representation matrix ? 
 
 No.
 
 I want to make groups of equal sizes, where size isn't simply number of 
 rows (allowing a simple 'gl'), but a sum

Re: [R] Binning question (binning rows of a data.frame according to a variable)

2006-03-20 Thread Adaikalavan Ramasamy

[[ Please ignore the last email which was sent incomplete ]]

Lets say there are 10 students in the first group and denote x1 as (say)
the number of red balls for student 1 and s1 the total balls. Then I was
calculating the average the proportion ( x1/s1 + x2/s2 + ... + x10/s10 )
and you were calculating the average number of events (x1+x2
+...+x10)/(s1+s2+...+s10).

It is just by chance that your calculation and mine agrees. When the
numbers are highly unbalanced, you may get very different results.



On second thoughts I think it is much better to calculate the a weighted
average of the proportions. The weights should reflect the variance of
the estimate of the proportions. Assuming that your outcome of interest
is proportions, the summary effect size might look something like 
 
  p_hat = ( w1*p1 + w2*p2+ ... + w10*p10 ) 
 
  where p1 = x1/s1 and w1=1/var(p1).

You should be able to obtain the standard errors for this estimate. 
Using this you can build a confidence interval and see if they overlap 
with proportion of reds in other groups. 



There is a big field called meta-analysis that deals with this kind of 
issue. You might want to read up more about this area. However I am not 
too familiar with the meta-analysis of proportion

Perhaps someone on the mailing list can advise you if this approach is
appropriate for your situation and perhaps even some references.


Regards, Adai

SNIP

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] removing NA from a data frame

2006-03-18 Thread Adaikalavan Ramasamy

You might find the 2nd part of the following response useful
https://stat.ethz.ch/pipermail/r-help/2006-March/090611.html

And if you want to RTFM, I guess sections 2.5, 2.7, 5.1, 5.2 of
http://cran.r-project.org/doc/manuals/R-intro.html might be useful.


PS: 

1) R-help is designed for and by unpaid volunteers. Therefore sometimes
RTFM without page reference is quite acceptable.

2) Similar question often gets repeated over and over the list. It might
be useful to search http://finzi.psych.upenn.edu/nmz.html first.



On Fri, 2006-03-17 at 16:17 -0500, Sam Steingold wrote:
  * Francisco J. Zagmutt [EMAIL PROTECTED] [2006-03-17 21:09:48 +]:
 
  Go to the help menu- manuals in pdf and select An Introduction to
  R.  After you read that document you will be able to answer your
  questions :-)
 
 I did.  I still need help.
 
 The matter is not so much with getting things done (I can probably
 write the code - although I would rather not) as with not reinventing
 the wheel.
 
 PS. next time you decide to answer my question with RTFM, please also
 include the number of the page that answers my specific question.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Binning question (binning rows of a data.frame according to a variable)

2006-03-18 Thread Adaikalavan Ramasamy

Do you by any chance want to sample from each group equally to get an
equal representation matrix ? Here is an example of the input :

 mydf - data.frame( value=1:100, value2=rnorm(100),
 grp=rep( LETTERS[1:4], c(35, 15, 30, 20) ) )

which has 35 observations from A, 15 from B, 30 from C and 20 from D.


And here is a function that I wrote:

 sample.by.group - function(df, grp, k, replace=FALSE){

   if(length(k)==1){ k - rep(k, length(unique(grp))) }

   if(!replace  any(k  table(grp)))
 stop( paste(Cannot take a sample larger than the population when
 'replace = FALSE'.\n, Please specify a value greater than,
 min(table(grp)), or use 'replace = TRUE'.\n) )

  
   ind   - model.matrix( ~ -1 + grp )
   w.mat - list(NULL)
   
   for(i in 1:ncol(ind)){
 w.mat[[i]] - sample( which( ind[,i]==1 ), k[i], replace=replace )
   }
  
   out - df[ unlist(w.mat), ]
   return(out)
 }


And here are some examples of how to use it :
 
mydf - mydf[ sample(1:nrow(mydf)), ]   # scramble it for fun


out1 - sample.by.group(mydf, mydf$grp, k=10 )
table( out1$grp )

 out2 - sample.by.group(mydf, mydf$grp, k=50, replace=T) # ie bootstrap
 table( out2$grp )

and you can even do bootstrapping or sampling with weights via:

 out3 - sample.by.group(mydf, mydf$grp, k=c(20, 20, 30, 30), replace=T)
 table( out3$grp )


Regards, Adai



On Fri, 2006-03-17 at 16:01 +, Dan Bolser wrote:
 Hi,
 
 I have tuples of data in rows of a data.frame, each column is a variable 
 for the 'items' (one per row).
 
 One of the variables is the 'size' of the item (row).
 
 I would like to cut my data.frame into groups such that each group has 
 the same *total size*. So, assuming that we order by size, some groups 
 should have several small items while other groups have a few large 
 items. All the groups should have approximately the same total size.
 
 I have tried various combinations of cut, quantile, and ecdf, and I just 
 can't work out how to do this!
 
 Any help is greatly appreciated!
 
 All the best,
 Dan.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] removing ROWS with missing values

2006-03-16 Thread Adaikalavan Ramasamy

My answers are going to be very similar but with minor cosmetic changes
that hopefully will make it bit more clearer.


1) How do you read in the data ? If you are using read.table (or
read.csv, read.delim, etc) you can set na.strings=-999 to take
advantage of the R's missing value features.


2) First count how many missing values. Then subset to the rows with at
least 6 numerical values:
 
  number.present - rowSums( myMatrix != -999 )
  good.rows  - which( number.present = 6 )
  myMatrix.sub   - myMatrix[ good.rows, ]

Note : change the first line to rowSums( !is.na( myMatrix ) ) if you
have coded missing values properly as in comment 1).


Regards, Adai



On Thu, 2006-03-16 at 21:45 +0100, [EMAIL PROTECTED] wrote:
 Quoting mark salsburg [EMAIL PROTECTED]:
 
  I am trying to find out if R can recognize specific criteria for removing
  rows (i.e. a prexisting function)
 
  I have a matrix myMatrix that is 12000 by 20
 
  I would like to remove rows from myMatrix that have:
 
  -999 across all columns
  -999 across all columns but one
  -999 across all columns but two
  -999 across all columns but three
  -999 across all columns but four
  -999 across all columns but five
 
  (-999 here is my missing value)
 
  Does R have a function for this, I've explored subset() so far
 
 
 You can create a vector that records the number of missing values
 in each row
 
 n.notmissing - apply(myMatrix != -999, 1, sum)
 
 then use row subsetting to remove the ones you don't want
 
 myMatrix[n.notmissing == n, ]
 
 for n = 0, 1, ... 5, etc.
 
 (As an aside, R functions will work better with your data if you use NA
 instead of a numeric code to represent missing data.)
 
 Martyn
 ---
 This message and its attachments are strictly confidential. ...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] appending objects to a file created by save()

2006-03-10 Thread Adaikalavan Ramasamy

Another flexible approach is to zip/tar all the required individual .rda
files together. There are two advantages that I see :
 1) You can extract a single file from the collection if you want.
 2) You can easily list what objects are in the zipped/tarred file. 

In R you have to load all object from a single .rda if you want to
extract a single object or even to list what objects are stored.

In terms of size, the zipped/tarred file gives comparable if not smaller
size than R's function save() with compress=TRUE option. But I have
tested this feature in depth.

Regards, Adai



On Fri, 2006-03-10 at 09:49 +, David Whiting wrote:
 On Fri, 2006-03-10 at 03:46 -0500, Rajarshi Guha wrote:
  Hi,
I've been slowly transitioning to saving sets of objects for a project
  using save() rather than cluttering my workspace and then doing
  save.image()
  
  However, sometimes after I have done say:
  
  save(x,y,z, file='work.Rda')
  
  and I reload it a little later and  I see that I also want to save
  object p. Currently I need to do:
  
  save(x,y,z,p, file='work.Rda')
  
  Is there any way to instruct save to append an object to a previously
  created binary data file?
 
 I use this approach. One potential problem with this approach is that if
 you have large saved objects you could get into problems because you
 need to load them before saving them. 
 
 ## Function to append an object to an R data file.
 append.Rda - function(x, file) {
   old.objects - load(file, new.env())
   save(list = c(old.objects, deparse(substitute(x))), file = file)
 }
 
 
 ## Example:
 x - 1:10
 y - letters[1:10]
 save(list = c(x, y), file = temp.Rda)
 z - fred
 append.Rda(z, temp.Rda)
 
 
 Dave
 
  
  Thanks,
  
  ---
  Rajarshi Guha [EMAIL PROTECTED] http://jijo.cjb.net
  GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
  ---
  CChheecckk yyoouurr dduupplleexx sswwiittcchh..
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] To improve my understanding of workspaces

2006-03-10 Thread Adaikalavan Ramasamy

I use emacs and ESS to develop the scripts. The new releases of R has
the script function already in built.

Typically I keep all the data and scripts related to a project in its
own folder, so I have minimal worry about paths.

To save large and associated objects, I use 
   save(x, y, z, file=lala.rda, compress=TRUE) 
and then to load x, y, z in another session or workspace I use
   load(lala.rda) 

To save small dataframes and matrices, I use 
   write.table(mat, file=lala.txt, sep=\t) 
and to read it back I use
   mat - read.delim(file=lala.txt, row.names=1)


The problem with .RData (via quit or save.image), is that it keeps all
intermediate objects which can be unnecessarily bloated and confusing.
Further you will have difficulty distinguishing one .RData from the
other by looking at the filename alone.

Regards, Adai



On Fri, 2006-03-10 at 06:58 -0500, Kevin E. Thorpe wrote:
 Hello.
 
 I have grown accustomed to the .Data directory in S-Plus and so when
 I came to R I continued that behaviour by saving my workspaces at
 the end of each R session.  So, I have saved workspaces in various
 directories where I have used R just as I would have had various
 .Data directories where I had used S-Plus.
 
 I have seen comments on the list, most recently from Prof. Ripley
 that they don't routinely save their workspaces in this way.
 So my questions are:
 
1. What do people do instead to manage projects?
2. Is there an official recommendation?
 
  From my reading I have learned that you can save data frames
 (and other objects?) to disk and then attach them.  Does this
 save memory?  If I have read correctly, I understand that
 everything in the workspace is in memory, but haven't been able
 to determine if objects in the search path are as well.
 
 Kind Regards,
 
 Kevin


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] To improve my understanding of workspaces

2006-03-10 Thread Adaikalavan Ramasamy

A lot of programming style are personal choices and as such varies from
individual to individual. See my comments below.

On Fri, 2006-03-10 at 09:01 -0500, Kevin E. Thorpe wrote:
 Thanks Adai.  A couple questions/comments about this.
 
 Adaikalavan Ramasamy wrote:
  I use emacs and ESS to develop the scripts. The new releases of R has
  the script function already in built.
 
 I use emacs and ESS too (in Linux).  I do not know about the script
 function you mention.  It's not in my version (2.1.1) and I couldn't
 find it in an RSiteSearch either.

I meant to say in newer releases of R _for Windows only_ has script
function. Look under File-New scripts (untested). But however it does
not appear to have syntax highlighting or auto indenting that emacs has.


  Typically I keep all the data and scripts related to a project in its
  own folder, so I have minimal worry about paths.
 
 I do the same.
 
  To save large and associated objects, I use 
 save(x, y, z, file=lala.rda, compress=TRUE) 
  and then to load x, y, z in another session or workspace I use
 load(lala.rda) 
  
  To save small dataframes and matrices, I use 
 write.table(mat, file=lala.txt, sep=\t) 
  and to read it back I use
 mat - read.delim(file=lala.txt, row.names=1)
 
 Am I correct that load() or read.whatever() or even data() will
 bring the objects into the current workspace while attach() can
 attach a save() data frame to the search path?  Is one approach
 better than the other in general?

I think you are correct.

The attach function appears to have two functions now :
 a) attach(lala.rda) loads objects from lala.rda into the search path
 b) attach(obj) makes the named columns of a dataframe or list available
in the search path. Therefore you only need to type 'aaa' instead of 
obj$aaa or obj[ , aaa]

The second is the more popular form of usage. 

Personally I would rather not use attach() and prefer to type obj$aaa or
use in the context of lm( aaa ~ ., data=obj ).



  The problem with .RData (via quit or save.image), is that it keeps all
  intermediate objects which can be unnecessarily bloated and confusing.
  Further you will have difficulty distinguishing one .RData from the
  other by looking at the filename alone.
 
 If you don't save the workspace on q(), do you also lose the history for
 that session (although when working in emacs, this is rarely a problem)?

I would argue that script file is a better way than history files
because I can clean up any test or wrong codes I might have in the
script file.


However if you prefer to save the history, you can use
savehistory(file=history.txt) at any point 

Regards, Adai

SNIP

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] One way ANOVA with NO model

2006-03-10 Thread Adaikalavan Ramasamy

Suppose you have 6 groups (A, B, C, D, E, F) and you measured the weight
of 5 individuals from each group. Therefore you have 30 weight
observations in total.

You wish to test if the mean of the response variable is different for
each of the groups. 
[ i.e. the null hypothesis is that all 6 groups means are the same. ]


Lets simulate some data first:

 grp - gl(6, k=5, labels=LETTERS[1:6])
 grp
  [1] A A A A A B B B B B C C C C C D D D D D E E E E E F F F F F
 Levels: A B C D E F

 set.seed(1)# for reproducibility only
 w - runif(30, min=40, max=75) # weights 
 w - round(w, digits=1)


Let us first calculate the group means:

   tapply(w, grp, mean)
   A B C D E F
   56.24 62.36 55.54 63.54 55.34 53.94

The group means are close, except for possibly group B and D.


You can do a formal testing by regressing the response (weight) to its
predictors (group). You will need to use the lm() function in R.

   fit - lm( w ~ grp )

 
You can get a summary of the fit by 

   summary(fit)
   ...
   Coefficients:
   Estimate Std. Error t value Pr(|t|)
   (Intercept)   56.240  4.725  11.903 1.48e-11 ***
   grpB   6.120  6.682   0.9160.369
   grpC  -0.700  6.682  -0.1050.917
   grpD   7.300  6.682   1.0930.285
   grpE  -0.900  6.682  -0.1350.894
   grpF  -2.300  6.682  -0.3440.734
   ...

This simply says that the intercept is strongly NOT zero. Based on the
p-values, one can roughly summarise that none of the groups appear to be
different.


Another useful tool is the ANOVA test which tests if the between group
variations are larger than average within group variation.

   anova(fit)
   Analysis of Variance Table

   Response: w
 Df  Sum Sq Mean Sq F value Pr(F)
   grp5  411.15   82.23  0.7367 0.6033
   Residuals 24 2678.79  111.62

This says that there is no significant variation between the groups.

Hope this helps.

Regards, Adai



On Fri, 2006-03-10 at 11:24 -0500, Jason Horn wrote:
 I'd like to do a simple one-way ANOVA comparing the means of 6  
 groups.  But it seems like the only way to do an ANOVA in R is to  
 specify some sort of model, where there is an outcome or dependent  
 variable that is a function of independent variables (linear model).   
 But I don't have a linear model, I just want to do a simple ANOVA  
 (and f-test) to compare the means.  How do I do this?  My stats  
 skills are basic, so please bear with me.
 
 Thanks for any ideas...
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Ranking within factor subgroups

2006-02-24 Thread Adaikalavan Ramasamy

Thank you! I did not know about the split and unsplit functions. It
looks like a very powerful and useful combination to master.

Regards, Adai



On Thu, 2006-02-23 at 07:28 +0100, Peter Dalgaard wrote:
 maneesh deshpande [EMAIL PROTECTED] writes:
 
  Hi Adai,
  
  I think your solution only works if the rows of the data frame are ordered 
  by date and
  the ordering function is the same used to order the levels of 
  factor(df$date) ?
  It turns out (as I implied in my question) my data is indeed organized in 
  this manner, so my
  current problem is solved.
  In the general case, I suppose, one could always order the data frame by 
  date before proceeding ?
  
  Thanks,
  
  Maneesh
 
 You might prefer to look at split/unsplit/split-, i.e. the z-scores
 by group line:
 
  z - unsplit(lapply(split(x, g), scale), g)
 
 with scale suitably replaced. Presumably (meaning: I didn't quite
 read your code closely enough)
 
 z - unsplit(lapply(split(x, g), bucket, 10), g)
 
 could do it.
  
  
  From: Adaikalavan Ramasamy [EMAIL PROTECTED]
  Reply-To: [EMAIL PROTECTED]
  To: maneesh deshpande [EMAIL PROTECTED]
  CC: r-help@stat.math.ethz.ch
  Subject: Re: [R]  Ranking within factor subgroups
  Date: Wed, 22 Feb 2006 03:44:45 +
  
  It might help to give a simple reproducible example in the future. For
  example
  
df - cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100),
B=rpois(500, 50), C=rpois(500, 30) )
  
  might generate something like
  
 date   A  B  C
   11  93 51 32
   21  95 51 30
   31 102 59 28
   41 105 52 32
   51 105 53 26
   61  99 59 37
 .... ... .. ..
 4955 100 57 19
 4965  96 47 44
 4975 111 56 35
 4985 105 49 23
 4995 105 61 30
 5005  92 53 32
  
  Here is my proposed solution. Can you double check with your existing
  functions to see if they are correct.
  
  decile.fn - function(x, nbreaks=10){
br - quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T )
br[1]  - -Inf
return( cut(x, br, labels=F) )
  }
  
  out - apply( df[ ,c(A, B, C)], 2,
function(v) unlist( tapply( v, df$date, decile.fn ) ) )
  
  rownames(out) - rownames(df)
  out - cbind(df$date, out)
  
  Regards, Adai
  
  
  
  On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote:
Hi,
   
I have a dataframe, x of the following form:
   
DateSymbol   AB  C
20041201 ABC  10  12 15
20041201 DEF   95   4
...
20050101 ABC 5  3   1
20050101 GHM   12 42

   
here A, B,C are properties of a set symbols recorded for a given date.
I wante to decile the symbols For each date and property and
create another set of columns bucketA,bucketB, bucketC containing 
  the
decile rank
for each symbol. The following non-vectorized code does what I want,
   
bucket - function(data,nBuckets) {
 q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T)
 q[1] - q[1] - 0.1 # need to do this to ensure there are no extra 
  NAs
 cut(data,q,include.lowest=T,labels=F)
}
   
calcDeciles - function(x,colNames) {
nBuckets - 10
dates - unique(x$Date)
for ( date in dates) {
  iVec - x$Date == date
  xx - x[iVec,]
  for (colName in colNames) {
 data - xx[,colName]
 bColName - paste(bucket,colName,sep=)
 x[iVec,bColName] - bucket(data,nBuckets)
  }
}
x
}
   
x - calcDeciles(x,c(A,B,C))
   
   
I was wondering if it is possible to vectorize the above function to 
  make it
more efficient.
I tried,
rlist - tapply(x$A,x$Date,bucket)
but I am not sure how to assign the contents of rlist to their 
  appropriate
slots in the original
dataframe.
   
Thanks,
   
Maneesh
   
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
   
  
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
  


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] (Newbie) Aggregate for NA values

2006-02-24 Thread Adaikalavan Ramasamy

I think it makes perfect sense for R to drop it since 'NA' represents
uninformative information. I do not know if there is a elegant solution
but I would suggest that you make these 'NA' into an informative value.

Here is one possibility:

 df - data.frame( AA=1:10, BB=rep(1:5,2), CC=rep(1:2,5), DD=rnorm(10) )
 df[ 9:10, CC ] - NA

 df[is.na(df)] - lala   ## change NA's into informative category ##


 aggregate( df$DD, by=list( df$CC ), mean  )
 Group.1  x
   1   1  1.1533763
   2   2  0.6427338
   3lala -0.2745249

 aggregate( df$DD, by=list( df$BB, df$CC ), mean  )
  Group.1 Group.2   x
   11   1  0.47264081
   22   1  0.63795211
   33   1  1.66756015
   45   1  1.83535232
   51   2  0.89914287
   62   2  1.11102134
   73   2  0.22268699
   84   2  0.33808394
   94lala -0.60154608
   10   5lala  0.05249622

Regards, Adai



On Fri, 2006-02-24 at 10:16 -0500, Vivek Satsangi wrote:
 Folks,
 
 Sorry if this question has been answered before or is obvious (or
 worse, statistically bad). I don't understand what was said in one
 of the search results that seems somewhat related.
 
 I use aggregate to get a quick summary of the data. Part of what I am
 looking for in the summary is, how much influence might the NA's have
 had, if they were included, and is excluding them from the means
 causing some sort of bias. So I want the summary stat for the NA's
 also.
 
 Here is a simple example session (edited to remove the typos I made,
 comments added later):
 
  tmp_a - 1:10
  tmp_b - rep(1:5,2)
  tmp_c - rep(1:2,5)
  tmp_d - c(1,1,1,2,2,2,3,3,3,4)
  tmp_df - data.frame(tmp_a,tmp_b,tmp_c,tmp_d);
  tmp_df$tmp_c[9:10] - NA ;
  tmp_df
tmp_a tmp_b tmp_c tmp_d
 1  1 1 1 1
 2  2 2 2 1
 3  3 3 1 1
 4  4 4 2 2
 5  5 5 1 2
 6  6 1 2 2
 7  7 2 1 3
 8  8 3 2 3
 9  9 4NA 3
 1010 5NA 4
  aggregate(tmp_df$tmp_d,by=list(tmp_df$tmp_b,tmp_df$tmp_c),mean);
   Group.1 Group.2 x
 1   1   1 1
 2   2   1 3
 3   3   1 1
 4   5   1 2
 5   1   2 2
 6   2   2 1
 7   3   2 3
 8   4   2 2
 # Only one row for each (tmp_b, tmp_c) combination, NA's getting dropped.
 
  aggregate(tmp_df$tmp_d,by=list(tmp_df$tmp_c),mean);
   Group.1x
 1   1 1.75
 2   2 2.00
 
 What I want in this last aggregate is, a mean for the values in tmp_d
 that correspond to the tmp_c values of NA. Similarly, perhaps there is
 a way to make the second last call to aggregate return the values of
 tmp_d for the NA values of tmp_c also.
 
 How can I achieve this?
 
 --
 -- Vivek Satsangi
 Student, Rochester, NY USA
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How do I tell it which directory to use?

2006-02-22 Thread Adaikalavan Ramasamy

I think the idea of defining dir1 and dir2 is a good one. If you want to
simplify life even further, you can put these into files that get
initialised when R starts. See help(Startup) for details.

Regards, Adai


On Wed, 2006-02-22 at 16:54 +1100, [EMAIL PROTECTED] wrote:
 Tom,
 
 You can define your working directory by using:
 
 setwd(C:\Documents and Settings\Tom\My Documents\qpaper7\R Project Started 
 19 Dec 05)
 
 check that your file is there:
 list.files()
 
 and then use:
 
 source(myFile.txt) 
 
 the machine should load myFile
 
 You can go to another directory:
 setwd(anotherdir)
 
 and repeat the procedure.
 
 Or even better if you define a number of directories in an external file:
 
 dir1 - c(C:\Documents and Settings\Tom\My Documents\qpaper7\)
 dir2 - c(C:\Documents and Settings\Tom\My Documents\)
 
 and after loading the file at the beginning of the sesion you can use:
 
 setwd(dir1)  etc.
 
 Is it of any help to you?
 
 Cheers,
 
 Augusto
 
 
 
 Augusto Sanabria. MSc, PhD.
 Mathematical Modeller
 Risk Research Group
 Geospatial  Earth Monitoring Division
 Geoscience Australia (www.ga.gov.au)
 Cnr. Jerrabomberra Av.  Hindmarsh Dr.
 Symonston ACT 2609
 Ph. (02) 6249-9155
  
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Thomas L Jones
 Sent: Wednesday, 22 February 2006 4:31 PM
 To: R-project help
 Subject: [R] How do I tell it which directory to use?
 
 
 From Tom:
 
 In R 2.2.0 under Windows, I want to be able to give it a filename such 
 as myFile.txt without the quotes. But actually I mean:
 
 C:\Documents and Settings\Tom\My Documents\qpaper7\R Project Started 
 19 Dec 05\myFile.txt
 
 If I were to repeat this each time, my computer would get all bored 
 and cranky and start to drop bits (only a joke, of course). I think I 
 want to set the Home directory or the working directory or some 
 directory or other to the above directory. I may or may not want to 
 set some environmental variables.
 
 R 2.2.0; working directly from the console and copying and pasting 
 code which I want to test into the console. Windows XP Home Edition. 
 Administrator privileges are enabled. A curve ball: There are two 
 accounts, Tom and Jones; the data are stored under Tom, whereas 
 the computation is being done under the Jones account. I won't bore 
 you with the details of why I am doing this.
 
 I was able to call Sys.getenv (R_USER) and get the home directory.
 
 I am a newbie to R and not familiar with the terminology.
 
 Tom
 Thomas L. Jones, Ph.D., Computer Science
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] elements that appear only once

2006-02-22 Thread Adaikalavan Ramasamy

A slight variation on your solution but hopefully more readable:

names( which( table(a) == 1 ) )

Regards, Adai



On Wed, 2006-02-22 at 09:11 +, Robin Hankin wrote:
 Hi.
 
 I have a factor and I want to extract just those elements that appear  
 exactly once.
 How to do this?
 
 Toy example follows.
 
   a - as.factor(c(rep(oak,5) ,rep(ash,1),rep(elm,1),rep 
 (beech,4)))
   a
 [1] oak   oak   oak   oak   oak   ash   elm   beech beech beech beech
 Levels: ash beech elm oak
   table(a)
 a
ash beech   elm   oak
  1 4 1 5
  
 
 So I would want ash and elm, because there is only one ash and
 only one elm in my wood.
 
 My Best Effort:
 
 
   names(table(a)[table(a)==1])
 [1] ash elm
  
 
 This doesn't seem particularly elegant to me; there must be a better  
 way!
 
 anyone?
 
 
 
 
 --
 Robin Hankin
 Uncertainty Analyst
 National Oceanography Centre, Southampton
 European Way, Southampton SO14 3ZH, UK
   tel  023-8059-7743
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] call row names

2006-02-21 Thread Adaikalavan Ramasamy

1) It is not good practice to call your objects after existing R
functions (e.g. table)

2) I think you are getting rows and columns confused. If you want to
extract the rows/column of a matrix or dataframe, then try subsetting it
by mat[A1, ] or mat[ , v4]. See help(subset) for more information.

3) It looks to me that your object is a list. Try doing class(table).

Regards, Adai


On Tue, 2006-02-21 at 11:56 +, Ana Quitério wrote:
 Hi R users.
 
  
 
 I have a table like that:
 
  
 
 table
 
 
  
 
 var
 
 A1
 
 A2
 
 A3
 
 
 v1
 
 41203
 
 3.69
 
 2.31
 
 
 v2
 
 20577
 
 4.51
 
 8.60
 
 
 v3
 
 20625
 
 2.87
 
 3.50
 
 
 v4
 
 6115
 
 8.92
 
 2.97
 
 
 v5
 
 3160
 
 1.49
 
 2.21
 
 
 v6
 
 2954
 
 2.62
 
 5.98
 
 
 v7
 
 4731
 
 1.83
 
 7.53
 
 
 v8
 
 2435
 
 7.68
 
 3.50
 
 
 v9
 
 2296
 
 3.03
 
 4.84
 
 
 v10
 
 6153
 
 1.06
 
 4.28
 
 
 v11
 
 3157
 
 1.07
 
 1.15
 
 
 v12
 
 2996
 
 1.06
 
 1.01
 
 
 v13
 
 6084
 
 2.65
 
 2.63
 
 
 v14
 
 3115
 
 2.42
 
 5.70
 
 
 v15
 
 2969
 
 2.92
 
 7.53
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 * If  I want column A1 I do this: table$A1
 * And if I want row v4 how can I do? (probably the problem happens
 because the column var is not considered as row names, but in the reality
 was with this purpose that was created by me)
 
  
 
 Thanks in advance
 
  
 
 Ana Quiterio
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] visualise classification by factors (was Re: R-help Digest, Vol 36, Issue 21)

2006-02-21 Thread Adaikalavan Ramasamy

1) Please use a meaning subject line. Start a new thread instead of
replying to another thread. 

2) Please give a simple example (if possible reproducible) to help
explain the problem.

3) Please read the posting guide.



On Tue, 2006-02-21 at 15:12 +0300, Evgeniy Kachalin wrote:
 Hello, dear R users.
 
 I've already sent a question here, but I'm not sure that it had been read.
 
 I need to visualize classification of my numerical data based on 2-3 
 factors. As I suppose, the best way is a tree.
 With an orbitrary function at the ends (leaves), or at least with means 
 of my data at the ends.
 
 What is the way to do it? As I found, ctree offers binary 
 classification, but it that the only way? Of course, tree is not only 
 way, may be you could offer other ways.
 
 Thank you.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How to Import Data

2006-02-21 Thread Adaikalavan Ramasamy

1) You need to use sep=, which is appropriate for a CSV file.

2) You need to specify the FULL path to the file. See 
http://cran.r-project.org/bin/windows/base/rw-FAQ.html#R-can_0027t-find-my-file

3) You can use read.csv which is the read.table variant for CSV files.


For example

  a - read.csv( file=c:/Progra~1/Docume~1/ramasamy/x111.csv )

might work if you replace it with your full path. If you have the
_unique_ rownames in the first column, you can add the argument
row.names=1 in the call.

Regards, Adai



On Tue, 2006-02-21 at 08:52 -0500, Carl Klarner wrote:
 Hello,
 I am a very new user of R.  I've spent several hours trying to import
 data, so I feel okay asking the list for help.  I had an Excel file,
 then I turned it into a csv file, as instructed by directions.  My
 filename is x111.csv.  I then used the following commands to read this
 (fairly small) dataset in.  
 
 x111 -read.table(file='x111.csv',
 sep=,header=T,
 quote=,comment.char=,as.is=T)
 
 I then get the following error message.
 
 Error in file(file, r) : unable to open connection
 In addition: Warning message:
 cannot open file 'x111.csv', reason 'No such file or directory'
 
 I would imagine I'm not putting my csv file in the right location for R
 to be able to read it.  If that's the case, where should I put it?  Or
 is there something else I need to do to it first?
 Thanks for your help,
 Carl
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Ranking within factor subgroups

2006-02-21 Thread Adaikalavan Ramasamy

It might help to give a simple reproducible example in the future. For
example

 df - cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100),
 B=rpois(500, 50), C=rpois(500, 30) )

might generate something like

date   A  B  C
  11  93 51 32
  21  95 51 30
  31 102 59 28
  41 105 52 32
  51 105 53 26
  61  99 59 37
.... ... .. ..
4955 100 57 19
4965  96 47 44
4975 111 56 35
4985 105 49 23
4995 105 61 30
5005  92 53 32

Here is my proposed solution. Can you double check with your existing
functions to see if they are correct.

   decile.fn - function(x, nbreaks=10){
 br - quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T )
 br[1]  - -Inf
 return( cut(x, br, labels=F) )
   }

   out - apply( df[ ,c(A, B, C)], 2,
 function(v) unlist( tapply( v, df$date, decile.fn ) ) )

   rownames(out) - rownames(df)
   out - cbind(df$date, out)

Regards, Adai



On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote:
 Hi,
 
 I have a dataframe, x of the following form:
 
 DateSymbol   AB  C
 20041201 ABC  10  12 15
 20041201 DEF   95   4
 ...
 20050101 ABC 5  3   1
 20050101 GHM   12 42
 
 
 here A, B,C are properties of a set symbols recorded for a given date.
 I wante to decile the symbols For each date and property and
 create another set of columns bucketA,bucketB, bucketC containing the 
 decile rank
 for each symbol. The following non-vectorized code does what I want,
 
 bucket - function(data,nBuckets) {
  q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T)
  q[1] - q[1] - 0.1 # need to do this to ensure there are no extra NAs
  cut(data,q,include.lowest=T,labels=F)
 }
 
 calcDeciles - function(x,colNames) {
 nBuckets - 10
 dates - unique(x$Date)
 for ( date in dates) {
   iVec - x$Date == date
   xx - x[iVec,]
   for (colName in colNames) {
  data - xx[,colName]
  bColName - paste(bucket,colName,sep=)
  x[iVec,bColName] - bucket(data,nBuckets)
   }
 }
 x
 }
 
 x - calcDeciles(x,c(A,B,C))
 
 
 I was wondering if it is possible to vectorize the above function to make it 
 more efficient.
 I tried,
 rlist - tapply(x$A,x$Date,bucket)
 but I am not sure how to assign the contents of rlist to their appropriate 
 slots in the original
 dataframe.
 
 Thanks,
 
 Maneesh
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] writing a file using both cat() and paste()

2006-02-09 Thread Adaikalavan Ramasamy

With regards to the saving bit, you might want to try dput() or save()
as well.

On Thu, 2006-02-09 at 19:29 -0500, Jim Lemon wrote:
 Taka Matzmoto wrote:
  Hi R users
  
  I like to create a ASCII type file using cat() and paste()
  
  x - round(runif(30),3)
  cat(vector =( , paste(x,sep=),  )\n, file = vector.dat,sep=,)
  
  when I open vector.dat it was a long ugly file
  
  vector =( 
  ,0.463,0.515,0.202,0.232,0.852,0.367,0.432,0.74,0.413,0.022,0.302,0.114,0.583,0.002,0.919,0.066,0.829,0.405,0.363,0.665,0.109,0.38,0.187,0.322,0.582,0.011,0.586,0.112,0.873,0.671,
   
  )
  
  Also there was some problems right after opening parenthesis and before the 
  closing parenthesis. Two comma were there
  
  I like to to have a nice formatted one like below. That is, 5 random values 
  per a line
  
  vector =( 0.463,0.515,0.202,0.232,0.852,
  0.367,0.432,0.74,0.413,0.022,
  0.302,0.114,0.583,0.002,0.919,
  0.066,0.829,0.405,0.363,0.665,
  0.109,0.38,0.187,0.322,0.582,
  0.011,0.586,0.112,0.873,0.671)
  
 First, you might want to avoid using vector, as that is the name of an 
 R function. Say you have a 30 element data vector as above. If you 
 wanted to write a fairly general function to do this, here is a start:
 
 vector2file-function(x,file=,values.per.line=5) {
   if(nchar(file)) sink(file)
   cat(deparse(substitute(x)),-c(\n)
   xlen-length(x)
   for(i in 1:xlen) {
cat(x[i])
if(ixlen) cat(,)
if(i%%values.per.line == 0) cat(\n)
   }
   cat())
   if(i%%values.per.line) cat(\n)
   if(nchar(file))sink()
 }
 
 Jim
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Tranferring R results to word prosessors

2006-02-09 Thread Adaikalavan Ramasamy

I agree that this is the best way. 

I often use Courier font with font size 10 that gives very good results.


On Thu, 2006-02-09 at 09:47 -0500, Gabor Grothendieck wrote:
 In Word use a fixed font such as Courier rather than a proportional
 font and it will look ok.
 
 On 2/9/06, Tom Backer Johnsen [EMAIL PROTECTED] wrote:
  I have just started looking at R, and are getting more and more irritated
  at myself for not having done that before.
 
  However, one of the things I have not found in the documentation is some
  way of preparing output from R for convenient formatting into something
  like MS Word.  An example:  If you use summary(lm()) you get nice
  output.  However, if you try to paste that output into the word processor,
  all the text elements are separated by blanks, and that is not optimal for
  the creation of a table (in the word processing sense).
 
  Is there an option to generate tab-separated output in R ? That would solve
  the problem.
 
  Tom
 
  ++
  | Tom Backer Johnsen, Psychometrics Unit,  Faculty of Psychology |
  | University of Bergen, Christies gt. 12, N-5015 Bergen,  NORWAY |
  | Tel : +47-5558-9185Fax : +47-5558-9879 |
  | Email : [EMAIL PROTECTED]URL : http://www.galton.uib.no/ |
  ++
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Tranferring R results to word prosessors

2006-02-09 Thread Adaikalavan Ramasamy

As much as I love LaTeX, I would be cautious on recommending it for
someone with a short term objective or does not really need to write
equations etc. 

Part of the reason is the initial step of getting the different
softwares required to make LaTeX work properly can be difficult.
However, I think this webpage does a good job of explaining it
http://www.math.aau.dk/~dethlef/Tips/introduction.html

WinEdt (http://www.winedt.com/) might also be worth checking out.

Regards, Adai


On Thu, 2006-02-09 at 14:11 -0500, Peter Flom wrote:
  roger bos [EMAIL PROTECTED] 2/9/2006 12:33 pm  wrote
 
 Yeah, but I don't understand LaTeX at all.  Can you point me to a good
 beginners guide?
 
 
 I like Math into LaTeX, by Gratzer.  
 For a real beginners guide, there's one called first steps in LaTeX.
 You might also want to look at issues of the PracTEX journal, many of which 
 are for beginners (It's an online journal)
 
 Peter
 
 Peter L. Flom, PhD
 Assistant Director, Statistics and Data Analysis Core
 Center for Drug Use and HIV Research
 National Development and Research Institutes
 71 W. 23rd St
 http://cduhr.ndri.org
 www.peterflom.com
 New York, NY 10010
 (212) 845-4485 (voice)
 (917) 438-0894 (fax)
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Plotting 27 line plots in one page

2006-02-09 Thread Adaikalavan Ramasamy

Try 

 par( mfrow=c(9,3) )
 for(i in 1:27) plot( lls[[i] )

but I think it might be a little crowded to put 9 rows in a page. 

Also check out the lattice package which is bit more complicated to
learn but gives prettier output.

Regards, Adai


On Thu, 2006-02-09 at 11:52 -0800, Srinivas Iyyer wrote:
 Dear group, 
  I am a novice programmer in R.  I have a list that
 has a length of 27 elements. Each element is derived
 from table function. 
 
 lls - table(drres)
 
 legnth(lls)
 27
 
 I want to plot all these elements in 9x3 plot (9 rows
 and 3 columns)
 par(9,3)
  mypltfunc - function(mydata){
 + for (i in 1:27){
 + plot(unlist(mydata[i]))
 + }
 + }
 
  mypltfunc(lls)
  
 
 In the graphics window, all 27 figures are drawn in
 fraction of sec, one by one and I get to see the last
 graph.  It is not drawing into this 9X3 grid. 
 
 Could any one help me please. 
 
 Thanks
 sri
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Plotting 27 line plots in one page

2006-02-09 Thread Adaikalavan Ramasamy

This works :

 # simulate some data
 mylist - list(NULL)
 for(i in 1:27) mylist[[i]] - rnorm( rpois( 1, lambda=20 ) )

 # execute
 par( mfrow=c(9,3) )
 par(mar = c(1,1,1,1), oma = c(1,1,1,1))
 for(i in 1:27) plot( mylist[[i]] )

Also if you just want to plot the distribution values etc, then you can
also try different possibilities such as
 
 boxplot( mylist )

Regards, Adai




On Thu, 2006-02-09 at 14:05 -0800, Srinivas Iyyer wrote:
 hi sarah, 
  thanks for your mail. 
 
 #
  par(mfrow=c(9,3))
  mypltfunc(lls)
 Error in plot.new() : figure margins too large
  par(mfcol=c(9, 3))
  mypltfunc(lls)
 Error in plot.new() : figure margins too large
 
 ##
 
 unfortunately I had this problem before. Thats the
 reason, I went on using more simply,  par(9,3).
 
 I tried the following too, although, truely I did not
 understand the much after doing ?par:
 
  mar = c(1,1,1,1)
  oma = c(1,1,1,1)
  par(mar,oma)
 [[1]]
 NULL
 
 [[2]]
 NULL
 
  mypltfunc(lls)
  
 
 By doing this the problem turned out that it printed
 all 27 figures, one after other in fraction of second,
 and I see the last figure.
 
 
 
 given my background (molecular biology) sometimes it
 is very very difficult to understand the documentation
 due to terminology problem.
 
 thanks
 sri
 
 
 
 --- Sarah Goslee [EMAIL PROTECTED] wrote:
 
  
   I want to plot all these elements in 9x3 plot (9
  rows
   and 3 columns)
   par(9,3)
  
  
  You need to specify what par you want - see ?par for
  details.
  In this case, either
  
  par(mfrow=c(9,3))
  or
  par(mfcol=c(9, 3))
  
  will do what you want.
  
  Sarah
  --
  Sarah Goslee
  http://www.stringpage.com
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] (second round) creating a certain type of matrix

2006-02-08 Thread Adaikalavan Ramasamy

I cleaned up your function a bit but please double check

 generate.matrix - function(nr, runs=5){

   h   - nr/2## half of nr
   nc  - nr/10 + 1

   mat - matrix(0, nr, nc)   ## initialize
   
   mat[ ,1] - c( rep(1, h), rnorm(h) )   ## 1st column
   mat[ (h+1):(h+5), 2] - rnorm(5)   ## 2nd column
  
   if( nc  3 ){
for (i in 3:nc){  ## column 3 - end

  start -  h + 5*(i-2) + 1
  end   -  start + runs - 1

  mat[ start:end, i] - rnorm( runs )
}
   }
   return(mat)
 } 


However you can simplify this greatly. If you ignore the first column
(which looks like some initialisation column in simulation process),
then you have a matrix with nr/2 rows and nr/10 columns with diagonal
blocks 5 runs filled with rnorm values. Here is what I propose :


 gen.mat - function(x, runs=5){

   if( (x %% 2*runs)!=0 ) stop(x,  is not a multiple of , 2*runs)

   nr  - x/2   
   nc  - x/(2*runs)

   mat - matrix(0, nr, nc)  
   for (i in 1:nc) mat[ ((i-1)*runs + 1) : (i*runs), i ] - rnorm(runs)
  
   down - cbind( rnorm(nr), mat )
   top  - cbind( 1, matrix( 0, nr=nr, nc=nc ) )
   out  - rbind( top, down )
  
   return(out)
 }

# Examples 
 gen.mat(50)
 gen.mat(55) ## should generate an error
 gen.mat(24, runs=6)


Does this function do what you want to ?

Regards, Adai





On Tue, 2006-02-07 at 11:03 -0600, Taka Matzmoto wrote:
 Hi R users
 Here is what I got with help from Petr Pikal (Thanks Petr Pikal). I modified 
 Petr Pikal's code to a little
 to meet my purpose.
 
 I created a function to generate a matrix
 
 generate.matrix-function(n.variable)
 {
 mat-matrix(0,n.variable,(n.variable/2)/5+1) #matrix of zeroes
 dd-dim(mat) # actual dimensions
 mat[1:(dd[1]/2),1]-1 #put 1 in first half of first column
 mat[((dd[1]/2)+1):dd[1],1]-rnorm(dd[1]/2,0,1) #put random numbers in 
 following part of the matrix column 1
 mat[((dd[1]/2)+1):((dd[1]/2)+5),2]-rnorm(5,0,1) #put random numbers in 
 column2
 for (i in 3:(dd[2]))
 {
 length.of.rand.numbers - 5
 my.rand.num- rnorm(length.of.rand.numbers, 0,1)
 start - dd[1]/2+5*(i-2)+1
 end - start + length.of.rand.numbers-1
 mat[((start):end), i]- my.rand.num
 }
 mat
 }
 
 Do you (any R users) have any suggestion to this function to make this 
 function work better or efficiently?
 
 Taka
 It works but I
 
 From: Petr Pikal [EMAIL PROTECTED]
 To: Taka Matzmoto [EMAIL PROTECTED],r-help@stat.math.ethz.ch
 Subject: Re: [R] creating a certain type of matrix
 Date: Tue, 07 Feb 2006 08:58:59 +0100
 MIME-Version: 1.0
 Received: from mail.precheza.cz ([80.188.29.243]) by 
 bay0-mc8-f13.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Mon, 6 
 Feb 2006 23:59:02 -0800
 Received: from localhost (localhost [127.0.0.1])by mail.precheza.cz 
 (Mailer) with ESMTP id A636C34E584;Tue,  7 Feb 2006 08:59:00 +0100 (CET)
 Received: from mail.precheza.cz ([127.0.0.1])by localhost (mail.precheza.cz 
 [127.0.0.1]) (amavisd-new, port 10024)with LMTP id 28608-02-30; Tue, 7 Feb 
 2006 08:58:59 +0100 (CET)
 Received: from n1en1.precheza.cz (smtp.precheza.cz [192.168.210.31])by 
 mail.precheza.cz (Mailer) with ESMTP id 35E8634E582;Tue,  7 Feb 2006 
 08:58:59 +0100 (CET)
 Received: from pikal ([192.168.210.65])  by n1en1.precheza.cz 
 (Lotus Domino Release 6.5.4FP2)  with ESMTP id 2006020708585800-252 
 ;  Tue, 7 Feb 2006 08:58:58 +0100 X-Message-Info: 
 JGTYoYF78jEHjJx36Oi8+Z3TmmkSEdPtfpLB7P/ybN8=
 X-Confirm-Reading-To: Petr Pikal [EMAIL PROTECTED]
 X-pmrqc: 1
 Return-Receipt-To: Petr Pikal [EMAIL PROTECTED]
 Priority: normal
 X-mailer: Pegasus Mail for Windows (4.21c)
 X-MIMETrack: Itemize by SMTP Server on SRVDomino/PRECHEZA(Release 6.5.4FP2 
 | September 26, 2005) at 07.02.2006 08:58:58,Serialize by Router on 
 SRVDomino/PRECHEZA(Release 6.5.4FP2 | September 26, 2005) at 07.02.2006 
 08:58:58,Serialize complete at 07.02.2006 08:58:58
 X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at precheza.cz
 Return-Path: [EMAIL PROTECTED]
 X-OriginalArrivalTime: 07 Feb 2006 07:59:03.0289 (UTC) 
 FILETIME=[5C87D690:01C62BBC]
 
 Hi
 
 as only you know perfectly which halves and other portions of your
 matrices contain zeroes and which contain random numbers you has to
 finalize the function yourself.
 Here are few ideas.
 
 n-20
 mat-matrix(0,n,(n/2)/5+1) #matrix of zeroes
 dd-dim(mat) # actual dimensions
 mat[1:(dd[1]/2),1]-1 #put 1 in first half of first column
 mat[((dd[1]/2)+1):dd[1],1]-rnorm(dd[1]/2,0,1) #put random numbers in
 following part of the matrix column 1
 mat[((dd[1]/2)+1):(dd[1]/2)+dd[1]/4,2]-rnorm(dd[1]/4,0,1) #put
 random numbers in column2
 
 than according to n and dd values you can put any numbers anywhere in
 your matrix e.g. in for loop (not.tested :-)
 
 for (i in 3:dd[2]) {
 
 arrange everything into following desired columns
 e.g.
 
 length.of.rand.numbers - (i-2)*5
 my.rand.num-

Re: [R] large lines of data

2006-02-08 Thread Adaikalavan Ramasamy

How does the data look and how are you storing in R (e.g. matrix, list)?

I think this an issue related to Word where it is using either unequal
spaces or different carriage returns. I would not recommend storing
data, especially numerical ones in the form of a matrix, in Word files. 

I would recommend that you try to copy-and-paste into Excel first and
clean it up there. Next save the file as tab delimited and use
read.delim() in R. 

My experience is that that Excel seems understands the oddities of Word
better than R does.

Regards, Adai


On Wed, 2006-02-08 at 11:55 +, Sara Mouro wrote:
 Dear All,
 
  
 
 I have to enter many lines of data in the same object.
 
 I usually use copy-paste to transfer data from an Word file to R.
 
 But, for large lines of data, R gets confused and gives an error message,
 i.e. it breaks one line somewhere, and lines get no meaning at all.
 
  
 
 Some times I solve that problem adding enters and making each line
 shorter, before I do copy-paste. Some times I add spaces in the word
 document, until R breaks each line (automatically adds a +) in any point
 where it still correct..
 
 But it stills too subjective for me!   :o\
 
  
 
 What is the best way to do that?
 
  
 
 
 
 Regards,
 
 Sara Mouro
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] lme help

2006-02-08 Thread Adaikalavan Ramasamy

Please read the posting 
1) I think BioConductor mailing list might be better as some of these
could be implemented via LIMMA (I believe)
2) Provide sufficient information and perhaps a simple example.

Regards, Adai



On Wed, 2006-02-08 at 10:42 +0100, Mahdi Osman wrote:
 Hi list,
 
 
 I am fitting microarray data (intensity) model using the lme package in R
 environment. I have 5 fixed variables in the model. One of the fixed
 variables is genes. I am trying to get p-values for  different genes. But I
 am getting only one p-value for all genes together. I can get a list of
 p-value when I run lm. Why can't this work in lme?
 
 My aim is to do multiple comaprison of all the genes that I have and I can
 only do this if I have a list of their p-vales
 
 
 I was wondering if you can help me solve this problem. That is getting
 a list of p-value for each gene in the model using the lme.
 
 
 Thanks in advance for your help
 
 
 Regards
 
 
 
 Mahdi 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Application of R

2006-02-08 Thread Adaikalavan Ramasamy

No Excel attachment came through. 

Just taking a guess here but there seems to be very little variation the
columns V10 till column V23.

BTW, can you not issue the following call :

 mydata[ , 1:7] ~ mydata[ , 8] + mydata[ ,9]

instead of creating y1, y2, ... separately then cbind-ing them ?

Regards, Adai




On Tue, 2006-02-07 at 21:52 +0800, Andy Wong wrote:
 I have applied the R and MNP to carry out the data analysis.  However, there
 is an error called SWP : singular matrix.  Can someone tell me what is the
 problem of my formula or the file mydata.
 
 I have attached the data file mydata in Excel format and the result
 printed in pdf format for your information.
 
 Thanks for your advice.
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] dataframe subset

2006-02-08 Thread Adaikalavan Ramasamy

Sounds like you may need no use match().

On Wed, 2006-02-08 at 15:21 +0100, Bernhard Baumgartner wrote:
 I have a dataframe with a column, say x consisting of values, each 
 value appearing different times, e.g.
 x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
 and a vector, including e.g.:
 y: 2,9,10,...
 I need a subset of the dataframe: all rows where x is equal to one of 
 the values in y. Currently I use a loop for this, but because x and y 
 are large this is very slow. 
 Is there any idea how to solve this problem faster?
 Thank you,
 Bernhard
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] for-loop with multiple variables changing

2006-02-06 Thread Adaikalavan Ramasamy

If you want a one-to-one action between corresponding pairs of a and
b, then how about simply :

 for( i in 1:length(a) ){
  print( number[i] )
  print( name[i] )
 }

If you want the first element of a to work with all elements of b,
the second element of a to work with all elements of b, ... then you
may find functions such as outer, sapply, mapply helpful.

Regards, Adai



On Mon, 2006-02-06 at 11:53 +0100, Piet van Remortel wrote:
 Hi all,
 
 Never really managed to build a for-loop with multiple running  
 variables in an elegant way.
 
 Can anybody hint ?
 
 See below for an example of what I would like.
 
 EXAMPLE
 a-c(1,2,3)
 b-c(name1,name2,name3)
 
 for( number in a, name in b ) {
   print( number ) ##take a value
   print( name ) ##and have its name available from a second list
 }
 
 Does R support this natively ?
 
 thanks !
 
 Piet
 (Univ. of Antwerp - Belgium)
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R is GNU S, not C.... [was how to get or store .....]

2005-12-06 Thread Adaikalavan Ramasamy



On Tue, 2005-12-06 at 13:43 +0100, Martin Maechler wrote:
  vincent == vincent  [EMAIL PROTECTED]
  on Tue, 06 Dec 2005 11:09:36 +0100 writes:
 
 vincent shanmuha boopathy a écrit :
  a-function(a,b,c,d)
  {
  k=a+b
  l=c+d
  m=k+l
  }
  
  in this example the function will return only the value of m
  ...But I like to extract the values of l  k also.
  which command to use for storing or for extracting those intermediate 
 value...
 
 vincent may I suggest, inside your function
 
 vincent res = c(k, l, m);
 vincent return(res);
 
 please, please,  these trailing ;  are  *so* ugly.
 This is GNU S, not C (or matlab) !
 
 {and I have another chain of argments why   - is so more
 expressive than =  but I'll be happy already if you could
 drop these ugly empty statements at the end of your lines...
 
 vincent # also ... read some intro docs !
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R is GNU S, not C.... [was how to get or store .....]

2005-12-06 Thread Adaikalavan Ramasamy

Yes, it drives me mad too when people use = instead of - for
assignment and suppress spaces in an naive attempt for saving space. 

As an example compare 

o=fn(x=1,y=10,z=1)

with

o - fn( x=1, y=10, z=1 )

Regards, Adai



On Tue, 2005-12-06 at 13:43 +0100, Martin Maechler wrote:
  vincent == vincent  [EMAIL PROTECTED]
  on Tue, 06 Dec 2005 11:09:36 +0100 writes:
 
 vincent shanmuha boopathy a écrit :
  a-function(a,b,c,d)
  {
  k=a+b
  l=c+d
  m=k+l
  }
  
  in this example the function will return only the value of m
  ...But I like to extract the values of l  k also.
  which command to use for storing or for extracting those intermediate 
 value...
 
 vincent may I suggest, inside your function
 
 vincent res = c(k, l, m);
 vincent return(res);
 
 please, please,  these trailing ;  are  *so* ugly.
 This is GNU S, not C (or matlab) !
 
 {and I have another chain of argments why   - is so more
 expressive than =  but I'll be happy already if you could
 drop these ugly empty statements at the end of your lines...
 
 vincent # also ... read some intro docs !
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] merging with aggregating

2005-12-06 Thread Adaikalavan Ramasamy


m1 - cbind(  n=c(1,2,3,4,6,7,8,9,10,13), v1=c(12,10,3,8,7,12,1,18,1,2),
 v2=c(0,8,8,4,3,0,0,0,0,0) )

m2 - cbind(  n=c(1,2,3,4,5,6,8,10,11,12), v1=c(0,0,1,12,2,2,2,4,7,0),
 v2=c(2,3,9,8,9,9,0,1,1,1) )

m.all - merge(m1, m2, by=n, all=T)

n v1.x v2.x v1.y v2.y
1   1   12002
2   2   10803
3   33819
4   484   128
5   5   NA   NA29
6   67329
7   7   120   NA   NA
8   81020
9   9   180   NA   NA
10 101041
11 11   NA   NA71
12 12   NA   NA01
13 1320   NA   NA

Then depending on how many such columns there are, you have a number of
ways of aggregating this dataset. One such way is

cbind( n=m.all[ , n], 
  v1=rowSums( m.all[ , grep( ^v1, colnames(m.all) )  ], na.rm=T ),
  v2=rowSums( m.all[ , grep( ^v2, colnames(m.all) )], na.rm=T ) )

n v1 v2
1   1 12  2
2   2 10 11
3   3  4 17
4   4 20 12
5   5  2  9
6   6  9 12
7   7 12  0
8   8  3  0
9   9 18  0
10 10  5  1
11 11  7  1
12 12  0  1
13 13  2  0

Regards, Adai


On Tue, 2005-12-06 at 14:22 +0100, Dubravko Dolic wrote:
 Dear List,
 
 I have two data.frame of the following form:
 
 A:
 
 n  V1 V2
 1  12  0 
 2  10  8
 3   3  8 
 4   8  4
 6   7  3  
 7  12  0 
 8   1  0 
 9  18  0 
 10  1  0
 13  2  0
 
 B:
 
 n  V1 V2
 1   0  2
 2   0  3
 3   1  9
 4  12  8 
 5   2  9
 6   2  9
 8   2  0
 10  4  1
 11  7  1
 12  0  1
 
 
 Now I want to merge those frame to one data.frame with summing up the
 columns V1 and V2 but not the column n. So the result in this example
 would be:
 
 AB:
 
 n  V1 V2
 1  12  2
 2  10 11 
 3   4 17
 4  20 12
 5   2  9
 6   9 12
 7  12  0
 8   3  0
 9  18  0
 10  5  1
 11  7  1
 12  0  1
 13  2  0 
 
 
 So Columns V1 and V2 are the sum of A und B while n has its old value.
 Notice that there are different rows in n of A and B.
 
 I don't have a clue how to start here. Any hint is welcome.
 
 Thanks
 
 Dubravko Dolic
 Munich
 Germany
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] urgent

2005-12-06 Thread Adaikalavan Ramasamy

1) R-help mailing list is run entirely by volunteers, so requests such
as urgent may sound rude

2) Use an informative subject line please !

3) Please state which package multhist comes from.

4) Please show your call to multhist.

5) multhist does _histograms_ by aggregating points within certain
intervals. In your case, you simply want a plot of your raw data. You
can use barplot directly via


 multi.barplot - function( mylist, ... ){ 
   u   -  unique( unlist( mylist ) )
   tb  -  t(sapply( mylist, function(v) table(factor(v, levels=u)) ) ) 
   barplot( tb, beside=TRUE, ... )
   return(tb)
 }
 

 x - c(7, 7 , 8, 9, 15, 17, 18)
 y - c(7, 8, 9, 15, 17, 19, 20, 20, 25, 23, 22)
 z - c(8, 9, 9, 9, 31)
 multi.barplot( list(x, y, z), col=1:3 )
 legend( topright, legend=c(one, two, three), fill=1:3 )


Regards, Adai



On Tue, 2005-12-06 at 15:32 +0530, Subhabrata wrote:
 Hello R Users,
 
 I have two sets of values
 
 x - c(7, 7 , 8, 9, 15, 17, 18)
 
 y - c(7, 8, 9, 15, 17, 19, 20, 20, 25, 23, 22)
 
 I am able to create multi histogram using
 multhist(). But not able to control the 'xlim'.
 ie the xaxis is showing 7.5, 13, 18, 23
 
 1st on what basis it is calculated
 
 2nd I want it to be like 7 8 9 15 17 and so on
 
 
 Can any one help me
 
 
 With Regards
 Subhabrata Pal
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] saving AIC of intermediate models in step

2005-11-30 Thread Adaikalavan Ramasamy

df   - data.frame( matrix( rnorm(1000), nc=10 ) )
colnames(df) - c(y, paste(x, 1:9, sep=))
ifit - glm( y ~ ., data=df ) # initial fit

a - stepAIC( ifit, keep=extractAIC )
a$keep
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]  10.000   9.   8.   7.   6.   5.   4.
[2,] 319.356 317.3819 315.4327 314.3526 313.2192 312.3311 311.1450
 [,8] [,9][,10]
[1,]   3.   2.   1.
[2,] 310.2517 309.1266 308.1171


On Tue, 2005-11-29 at 19:01 +0100, [EMAIL PROTECTED] wrote:
 Hi all,
   I'm fitting GLM's using the step or stepAIC procedures and I would 
 like to save the AIC of the intermediate models. I would appreciate 
 very much information about how todo this.
   Best wishes
   Germán López
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] symmetric matrix

2005-11-29 Thread Adaikalavan Ramasamy

Use as.matrix() :

 m - round( as.dist( cor( matrix( rnorm(600), nc=6 ) ) ), 2 )
 m
  1 2 3 4 5
2 -0.05
3  0.01  0.03
4  0.00  0.05  0.00
5  0.20  0.07  0.09 -0.07
6  0.03  0.02  0.11 -0.15 -0.11

 as.matrix( m )
  1 23 4 5 6
1  0.00 -0.05 0.01  0.00  0.20  0.03
2 -0.05  0.00 0.03  0.05  0.07  0.02
3  0.01  0.03 0.00  0.00  0.09  0.11
4  0.00  0.05 0.00  0.00 -0.07 -0.15
5  0.20  0.07 0.09 -0.07  0.00 -0.11
6  0.03  0.02 0.11 -0.15 -0.11  0.00




On Tue, 2005-11-29 at 03:04 -0800, Robert wrote:
 I have the following matrix:
 1 234 5
 2 0.7760856  
 3 2.016 1.6907899
 4 0.6148687 0.2424415 1.593916   
 5 3.0227028 2.3636083 1.512634 2.426591  
 6 3.2104434 2.5334957 1.730422 2.608584 0.2184739
   the diagonal is 0 and it is a symmetric matrix.
   Is there any function to return to the normal one?
   That is, the 6 by 6 one?

 
   
 -
 
  Single? There's someone we'd like you to meet.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Legend

2005-11-13 Thread Adaikalavan Ramasamy

And you want to have different colored lines but black texts, try

 legend(x = 5, y = 0.2, legend = c(Data Set, Fitted PDF),
col = c(black, red), lty=1)

The advantage of this is that you can use dotted (lty option) or lines
with different weights (lwd option).

Regards, Adai



On Sun, 2005-11-13 at 06:46 -0600, Sundar Dorai-Raj wrote:
 
 Mark Miller wrote:
  I use the following to plot two graphs over each other and then insert a 
  legend, but the two items in the legend both come up the same colour
  
  x = seq(0,30,0.01)
  plot(ecdf(complete), do.point=FALSE, main = 'Cummlative Plot of Monday IATs 
  for Data and\n Fitted PDF over Entire 15 Weeks')
  lines(x, pexp(x,0.415694806),col=red)
  legend(x=5,y=0.2 , legend=c(Data Set,Fitted PDF),col=c(black,red))
  
  Many thanks
  Mark Miller
  
 
 Hi, Mark,
 
 You want to use text.col in legend instead of col:
 
 set.seed(1)
 z - rexp(30, 0.415694806)
 x - seq(0, 30, 0.1)
 plot(ecdf(z), do.point = FALSE)
 lines(x, pexp(x, 0.415694806), col=red)
 legend(x = 5, y = 0.2, legend = c(Data Set, Fitted PDF),
 text.col = c(black, red))
 
 --sundar
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] selection of missing data

2005-11-13 Thread Adaikalavan Ramasamy

I do not quite follow your post but here are some suggestions. 


1) You can the na.strings argument to simplify things 

   df - read.delim(file=lala.txt, na.strings=- )


2) If you can count the number of metastasis per row first, then find
the rows with zero sum.

   met.cols  - c(11,12,14,21,23,24) # metastasis columns
   number.of.met - rowSums( mela[ , met.cols ] == - )
   have.no.met   - which( number.of.met == 0 )
   mela.no.met   - mela[ have.no.met , ]

If you had coded your - as NA during read in then, the second line
needs to be changed to

   number.of.met - rowSums( is.na( mela[ , met.cols ] ) )

or simply use complete.cases

   met.cols  - c(11,12,14,21,23,24) # metastasis columns
   mela.no.met   - mela[ which( complete.cases(mela[ , met.cols]) ) , ]


3) If you name your columns in a systematic fashion, then you can easily
extract and specify those columns. For example if your columns were
named 

   cn - c( age, colon.met, PSA.level, prostate.met, gender,
hospitalisation.days, status, liver.met, ethnicity)

Then you can extract those names ending with .met as

   met.cols - grep( \\.met$, cn )
   met.cols
   [1] 2 4 8


Regards, Adai



On Sun, 2005-11-13 at 18:40 +0100, [EMAIL PROTECTED] wrote:
 Hi i'm a french medical student,
 i have some data that i import from excel. My colomn of the datafram 
 are the localisations of metastasis. If there is a metatsasis there is 
 the symbol _. i want to exclude the row without metastasis wich 
 represent the NA data.
 
 so, i wrote this
 
 mela is the data fram
 
 mela1=ifelse(mela[,c(11:12,14:21,23,24)]==_,1,0) # selection of the 
 colomn of metastasis localisation
 
 mela4=subset(mela3,Skin ==0  s.c == 0  Mucosa ==0  Soft.ti ==0  
 Ln.peri==0  Ln.med==0  Ln.abdo==0  Lung==0  Liver==0  
 Other.Visc==0  Bone==0  Marrow==0  Brain==0  Other==0) ## selection 
 of the row with no metastasis localisation
 nrow(mela4)
 
 but i dont now if it is possible to make the same thin as 
 ifelse(mela3,Skin  s.c== 0, 0,NA) with more than colomn and after to 
 exclude of my data the Na with na.omit.
 
 The last question is how can i omit only the row which are NA value for 
 the colomn metastasis c(11:12,14:21,23,24))
 
 Thank you for your help
 
 
 
 Bertrand billemont
   [[alternative text/enriched version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] sibling list element reference during list definition

2005-11-12 Thread Adaikalavan Ramasamy

It would be more interesting to ask why does this does not work.

   mylist - list( value=5, plusplus = mylist$value + 1 )

I think this is because plusplus cannot be evaluated because mylist does
not exist and mylist cannot be created until plusplus is evaluated.

There are people on this list who can explain in more technical terms.
But I think reading this page might help
http://cran.r-project.org/doc/manuals/R-lang.html#index-evaluation_002c-symbol-166


Here is one option :

 mylist - eval( expression( list( value=x, plusplus=x+1) ), list(x=5) )
 mylist
 $value
 [1] 5
 $plusplus
 [1] 6


Or a bit easier to read is :

 myfun  - function(x) list( value=x, plusplus=x+1 )
 mylist - myfun(5)


Regards, Adai


On Sat, 2005-11-12 at 01:03 -0600, Paul Roebuck wrote:
 Can the value of a list element be referenced from a
 sibling list element during list creation without the use
 of a temporary variable?
 
 The following doesn't work but it's the general idea.
 
  list(value = 2, plusplus = $value+1)
 
 such that the following would be the output from str()
 
 List of 2
  $ value   : num 2
  $ plusplus: num 3
 
 --
 SIGSIG -- signature too long (core dumped)
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How to find statistics like that.

2005-11-10 Thread Adaikalavan Ramasamy

If my usage is wrong please correct me. Thank you.

Here are my reason :

1. p-value is a (cumulative) probability and always ranges from 0 to 1.
A test statistic depending on its definition can wider range of possible
values.

2. A test statistics is one that is calculated from the data without the
need of assuming a null distribution. Whereas to calculate p-values, you
need to assume a null distribution or estimate it empirically using
permutation techniques.

3. The directionality of a test statistics may be ignored. For example a
t-statistics of -5 and 5 are equally interesting in a two-sided testing.
But the smaller the p-value, more evidence against the null hypothesis.

Regards, Adai



On Thu, 2005-11-10 at 06:05 -0500, Duncan Murdoch wrote:
 On 11/9/2005 10:01 PM, Adaikalavan Ramasamy wrote:
  I think an alternative is to use a p-value from F distribution. Even
  tough it is not a statistics, it is much easier to explain and popular
  than 1/F. Better yet to report the confidence intervals.
 
 Just curious about your usage:  why do you say a p-value is not a statistic?
 
 Duncan Murdoch
 
  
  Regards, Adai
  
  
  
  On Wed, 2005-11-09 at 17:09 -0600, Mike Miller wrote:
  
 On Wed, 9 Nov 2005, Gao Fay wrote:
 
 
 Hi there,
 
 Suppose mu is constant, and error is normally distributed with mean 0 and 
 fixed variance s. I need to find a statistics that:
 Y_i = mu + beta1* I1_i beta2*I2_i + beta3*I1_i*I2_i + +error, where I_i is 
 1 
 Y_i is from group A, and 0 if Y_i is from group B.
 
 It is large when  beta1=beta2=0
 It is small when beta1 and/or beta2 is not equal to 0
 
 How can I find it by R? Thank you very much for your time.
 
 
 That's a funny question.  Usually we want a statistic that is small when 
 beta1=beta2=0 and large otherwise.
 
 Why not compute the usual F statistic for the null beta1=beta2=0 and then 
 use 1/F as your statistic?
 
 Mike
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
  
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] paste argument of a function as a file name

2005-11-10 Thread Adaikalavan Ramasamy

my.write - function( obj, name ){

  filename - file=paste( name, .txt, sep=)
  write.table( obj, file=filename, sep=\t, quote=F)

}

my.write( df, output )

Regards, Adai


On Thu, 2005-11-10 at 13:28 +, Luis Ridao Cruz wrote:
 R-help,
 
 I have a function which is exporting the output to a file via
 write.table(df, file =  file name.xls )
 
 What I want is to paste the file name (above) by taking the argument to
 the function as a file name 
 
 something like this:
 
 MY.function- function(df)
 {
 ...
 ...
 write.table(df,argument.xls)
 }
 MY.function(argument)
 
 
 Thank you
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Help regarding mas5 normalization

2005-11-10 Thread Adaikalavan Ramasamy

Please do not post to both BioConductor and R.



On Thu, 2005-11-10 at 09:51 -0700, Nayeem Quayum wrote:
 Hello everybody,
 I am trying to use mas5 to normalize some array data and using mas5 and
 mas5calls. But I received these warning message. If anybody can explain the
 problem I would really appreciate that. Thanks in advance.
 background correction: mas
 PM/MM correction : mas
 expression values: mas
 background correcting...Warning message:
 'loadURL' is deprecated.
 Use 'load(url())' instead.
 See help(Deprecated)
 Warning message:
 'loadURL' is deprecated.
 Use 'load(url())' instead.
 See help(Deprecated)
 Warning message:
 'loadURL' is deprecated.
 Use 'load(url())' instead.
 See help(Deprecated)
 There were 14 warnings (use warnings() to see them)
 Note: http://www.bioconductor.org/repository/devel/package/Win32 does not
 seem to have a valid repository, skipping
 Note: You did not specify a download type. Using a default value of: Source
 This will be fine for almost all users
 
 Error in FUN(X[[1]], ...) : no slot of name Uses for this object of class
 localPkg
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] how to convert strings back to values?

2005-11-09 Thread Adaikalavan Ramasamy


Problems like these could be caused by improperly spaced columns. Try
table(tdf1). If you see only 0 and 1, then you should be fine.
However I suspect that you might see things like  0, 0,  1, 1
which means that there is a an extra space between the delimiters.

Report back what you get and we can work around a solution if need be.

Regards, Adai



On Wed, 2005-11-09 at 21:55 +0100, Illyes Eszter wrote:
 Dear All, 
 
 It's Eszter from Hungary, a total beginner with R. My problem is the 
 following: 
 
 I have a dataset with binary values as a comma separated textfile. The 
 samples are in the coloumns and the species are in the rows. 
 
 I have to transpose it for the further PCoA analysis. There is no 
 problem with reading the dataset. 
 
 When I transpose the dataset, the original values become strings 
 (instead of 0,1,0,0,1 I have 0,1,0,0,1). The distance matrix 
 cannot be counted from the transposed dataset, I have 2 error 
 messages: 
 
 Warning in vegdist(tdf1, method = jaccard, binary = FALSE, diag = 
 FALSE,  : results may be meaningless because input data have 
 negative entries
 
 Error in rowSums(x, prod(dn), p, na.rm) : 'x' must be numeric
 
 I do not understand the first, since I have only 1 and 0 in the dataset. I 
 guess I have the second because of the strings instead of values in the 
 dataset. 
 
 Could you please help me solving these problems? I could not find 
 anything about these in the manuals. 
 
 Thank you, cheers:
 
 Eszter
 
 p.s. This is a new problem, last week I worked with a similar dataset 
 and I did not get any error message like these. 
  
 
 
 _
 Menő csengőhangok (MP3 is!) és színes képek a mobilodra. 
 Nálunk szinte mindent megtalálsz, KLIKK IDE! www.oplogo.hu
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How to find statistics like that.

2005-11-09 Thread Adaikalavan Ramasamy

I think an alternative is to use a p-value from F distribution. Even
tough it is not a statistics, it is much easier to explain and popular
than 1/F. Better yet to report the confidence intervals.

Regards, Adai



On Wed, 2005-11-09 at 17:09 -0600, Mike Miller wrote:
 On Wed, 9 Nov 2005, Gao Fay wrote:
 
  Hi there,
 
  Suppose mu is constant, and error is normally distributed with mean 0 and 
  fixed variance s. I need to find a statistics that:
  Y_i = mu + beta1* I1_i beta2*I2_i + beta3*I1_i*I2_i + +error, where I_i is 
  1 
  Y_i is from group A, and 0 if Y_i is from group B.
 
  It is large when  beta1=beta2=0
  It is small when beta1 and/or beta2 is not equal to 0
 
  How can I find it by R? Thank you very much for your time.
 
 
 That's a funny question.  Usually we want a statistic that is small when 
 beta1=beta2=0 and large otherwise.
 
 Why not compute the usual F statistic for the null beta1=beta2=0 and then 
 use 1/F as your statistic?
 
 Mike
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] accident modified dataset. How can I recovery it?!

2005-11-09 Thread Adaikalavan Ramasamy

Please do not post thrice, especially within 23 min of the first post.

Your problem is that cuckoos is located in DAAG package not the lattice
package. I am guessing that at some point you loaded DAAG in the initial
session but did not realise this on subsequent sessions.

Next time, search http://finzi.psych.upenn.edu/nmz.html first.

Regards, Adai



On Wed, 2005-11-09 at 22:46 +0100, jia ding wrote:
 I tried to reinstall the package. but my R version is too old.
 
 [EMAIL PROTECTED]:~$ sudo R CMD INSTALL -l /usr/lib/R/library
 /home/dj/Desktop/lattice_0.12-11.tar.gz
 Password:
 ERROR: This R is version 2.1.1
 package 'lattice' needs R = 2.2.0
 
 So, *my question being, how do I upgrade from R version *R = 2.2.0
 * to R *2.1.1* and keep all of my libraries intact? *
 
 
 On 11/9/05, jia ding [EMAIL PROTECTED] wrote:
 
  I first try these command, it works quite well.
  library(lattice)
  data(cuckoos)
  levnam - strsplit(levels(cuckoos$species), \\.)
 
  BUT, i want to try :
  levnam - strsplit(levels(cuckoos$species), .)
 
  to see the difference.
 
  They maybe I modified the data file, because when I try again, it says:
   data(cuckoos)
  Warning message:
  data set 'cuckoos' not found in: data(cuckoos)
 
  would you please tell me how to deal with this problem?
 
  I have already tried update.packages()
  it doesn't help.
 
  Thanks.
  DJ
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

1 2 3 4 5 >

1 - 100 of 412 matches

Mail list logo