Re: [R] Merging two files together in R
Not neccessary to do this as you can specify which column in the two datases to use as common using the arguments by.x and by.y in merge(). Morassa Mohseni wrote: Thanks! Ill give this a try. I forgot to mention that the SNP.ID is not named the same in both files, even though they contain the same information. I'll just go ahead and open one of the files in a text editor and rename the columns so they match. -Morassa PhD Student Johns Hopkins Human Genetics --- Try looking at ?merge If your data is in two dataframes df1 and df2: merge(df1, df2) (This will merge on SNPID because that column is common to both dataframes). --- -Original Message- From: [EMAIL PROTECTED] [*mailto:[EMAIL PROTECTED][EMAIL PROTECTED]] On Behalf Of Morassa Mohseni Sent: 24 August 2007 15:41 To: r-help@stat.math.ethz.ch Subject: [R] Merging two files together in R Hi, Thanks in advance for reading this post. I received some affymetrix genotyping data back recently (250K, Nsp array)...However, in order for me to do any analysis on this data set, I need to add append the annotation file to it. Basically I want to do something that looks like this: Snpfile(tab delimited): SNPID Genotype X Y 123 AA 13.4 1.2 456 AB 10.1 12.2 789 BB 2.7 14.4 Annotation file (csv file): rs#, SNPID, Chromosome rs23525, 456, 12 rs78423, 123, 4 rs82342, 789, 9 What I am trying to get is an output file that looks like this: SNPID rs# Chromosome Genotype X Y 123 rs78423 4 AA 13.4 1.2 456 rs23525 12 AB 10.1 12.2 789 rs82342 9 BB 2.7 14.4 The SNPID is the same in both files so I would like to use that to match up...but they are not in the same order in both files, so I want to make sure that I am appending and merging the 2 files correctly. So far all ive really been able to do is import the files into R...Ive been looking through the posts, and was wondering if I could use cbind(...) to merge the files?...not sure though. Thanks again!! Morassa Mohseni PhD Student Johns Hopkins Dept. of Human Genetics Baltimore, MD [[alternative HTML version deleted]] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Does anyone else think this might be worth a warning?!?
First, note that functions in R match named arguments first, followed by the position of the arguments in the call. Second, have a look at how mean and max are defined mean - function (x, trim = 0, na.rm = FALSE, ...){ max - function (..., na.rm = FALSE){ It's the difference in the position of ... argument or catchall argument (sorry, I don't know its formal name) that determines the different behaviour. The ... is often converted to a list internally. So when you type in mean(1,1,2), it is treated as mean( x=1, trim=1, na.rm=2 ). and when you type in max(1,1,2), it is treated as max( as.list(1,1,2), na.rm = FALSE ) However, you do raise a good point. Reading mean.default(), I do not see how and when the ... argument in mean() comes to play. Perhaps redefine mean to be mean - function (..., trim = 0, na.rm = FALSE) so that it is similar to max, sum, range etc. But there might be a philosopphical counter argument for this as well. Functions like mean() and sd() are supposed to summarise a single vector whereas max, sum, range can work on several vectors by concatenating them into a single list. Consider max( c(1,2,3), c(2,3,4) ). Regards, Adai Matthew Walker wrote: Hi, I was *very* surprised by this little trick for new players: mean() only considers its first argument! mean(1,1,2) [1] 1 mean(2,1,1) [1] 2 I found this very different behaviour to max(): max(1,1,2) [1] 2 max(2,1,1) [1] 2 Perhaps this is the wrong list to ask, but does anyone else think this a little on the interesting side? Is it not possible to detect a first argument of length one in the presence of other un-named arguments and at least produce a warning? Cheers, Matthew __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] princomp error
You probably got some missing or undefined values. Either eyeball the data or use sum(is.na(x)), sum(is.nan(x)), sum(is.infinite(x)) to find out if you have such data. You may want to use which() to find out where they are. Regards, Adai Bricklemyer, Ross S wrote: I am attempting to run principal components analysis on a dataset of spectral reflectance (6 decimal places). I imported the data using read.table and there are both column and row headers. When I run princomp I receive the following error: Error in cov.wt(z) : 'x' must contain finite values only Where am I going wrong? Ross *** Ross Bricklemyer Dept. of Crop and Soil Sciences Washington State University 291D Johnson Hall PO Box 646420 Pullman, WA 99164-6420 Work: 509.335.3661 Cell/Home: 406.570.8576 Fax: 509.335.8674 Email: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.
The name of the table should give you the value. And if you have a matrix, you just need to convert it into a vector first. m - matrix( LETTERS[ c(1:3, 3:5, 2:4) ], nc=3 ) m [,1] [,2] [,3] [1,] A C B [2,] B D C [3,] C E D tb - table( as.vector(m) ) tb A B C D E 1 2 3 2 1 paste( names(tb), :, tb, sep= ) [1] A:1 B:2 C:3 D:2 E:1 If this is not what you want, then please give a simple example. Regards, Adai Allan Kamau wrote: Hi all, If the question below as been answered before I apologize for the posting. I would like to get the frequencies of occurrence of all values in a given variable in a multivariate dataset. In short for each variable (or field) a summary of values contained with in a value:frequency pair, there can be many such pairs for a given variable. I would like to do the same for several such variables. I have used table() but am unable to extract the individual value and frequency values. Please advise. Allan. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting elements from a list
Try sapply( Lst, function(m) m[1,1] ) Also note that to subset a list, you just need Lst[ 1:10 ] and not Lst[[ 1:10 ]] (note the double square brackets). Regards, Adai Forest Floor wrote: Hi, I would love an easy way to extract elements from a list. For example, if I want the first element from each of 10 arrays stored in a list, Lst[[1:10]][1,1] seems like a logical approach, but gives this error: Error: recursive indexing failed at level 3 The following workaround is functional but can get annoying/confusing. first.element=vector() for (i in 1:10){ first.element=c(first.element, Lst[[i]][1,1]) } Is there a better way to do this? Thanks for any help! Jeff __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to remove the quote in the data frame?
You can achieve this by cbind.data.frame() Christophe Pallier wrote: Beware: you are not working with data.frames but with a vector and a matrice. (see ?cbind) Solution: convert 'res' to data.frame. Christophe On 7/14/07, Zhang Jian [EMAIL PROTECTED] wrote: If I do not add ress into the data frame res, there is no quote in the data frame. However, I add ress, all column were found the quote. How to remove it? If you can delete the quote in the file ress, that is better. Thanks. ress[1:10] [1] ABHO.ABNE ABHO.ACBA ABHO.ACGI ABHO.ACKO ABHO.ACMA ABHO.ACMO [7] ABHO.ACPS ABHO.ACSE ABHO.ACTE ABHO.ACTR res=cbind(obv.value,p.value,mean.sim) res[1:10,] obv.value p.value mean.sim [1,] 2 1.0 6.0 [2,] 0 1.0 0.0 [3,]66 0.5 49.6 [4,] 3 1.0 3.0 [5,] 0 1.0 64.7 [6,] 0 1.0 0.0 [7,] 0 1.0 0.0 [8,]51 0.5 39.8 [9,] 0 1.0 47.4 [10,]59 0.7 72.0 ress=cbind(res,ress) ress[1:10,] obv.value p.value mean.sim ress [1,] 2 1 6 ABHO.ABNE [2,] 0 1 0 ABHO.ACBA [3,] 66 0.5 49.6 ABHO.ACGI [4,] 3 1 3 ABHO.ACKO [5,] 0 1 64.7 ABHO.ACMA [6,] 0 1 0 ABHO.ACMO [7,] 0 1 0 ABHO.ACPS [8,] 51 0.5 39.8 ABHO.ACSE [9,] 0 1 47.4 ABHO.ACTE [10,] 59 0.7 72 ABHO.ACTR [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] legend and x,y cordinate values
See help(legend) and help(identify). Ajay Singh wrote: Hi, I have two problems in R. 1. I need 10 cdfs on a graph, the graph needs to have legend. Can you let me know how to get legend on the graph? 2. In ecdf plot, I need to know the x and y co-ordinates. I have to get corresponding y coordinate values to x coordinate value so that I could be able to know the particular percentile value to the x-coordinate value. Can you let me know how could I be able the corresponding values of x to the y coordinates? Thanking you, Looking forward to your kind response, Sincerely, Ajay. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Algorythmic Question on Array Filtration
Sorry, this sounds like a fairly basic question that can be resolved by which() and possible ifelse(). There is no details in your email. I am afraid you have to learn the basics of R or ask question with more details (e.g. example data). Or ask someone locally. Regards, Adai Johannes Graumann wrote: Dear All, I have a data frame with the columns Mass and Intensity (this is mass spectrometry stuff). Each of the mass values gives rise to a mass window of 5 ppm around the individual mass (from mass - mass/1E6*5 to mass + mass/1E5*5). I need to filter the array such that in case these mass windows overlap I retain the mass/intensity pair with the highest intensity. I apologize for this question, but I have no formal IT education and would value any nudges toward favorable algorithmic solutions highly. Thanks for any help, Joh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix of scatterplots
m - matrix( rnorm(300), nc=3 ) pairs(m, pch=20) or pairs(m, pch=.) See help(par) for more details. livia wrote: Hi, I would like to use the function pairs() to plot a matrix of scatterplots. For each scatterplot, the data are plotted in circles, can I add some argument to change the circles into dots? Could anyone give me some advice?Many thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Please Help
This is the R-help mailing list. See help(BATCH). You will need to write the required R commands in a separate script, say script.R and then execute it as R --no-save script.R logfile You may need to augment the code above to include directory paths etc. There are other useful documentations at http://www.r-project.org/ Regards, Adai Tanya Li wrote: Hello, I got this email address from http://tolstoy.newcastle.edu.au/R/e2/help/06/10/2516.html, I got started to use R recently, Can I ask you a question ? this is what I am using: platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 4.0 year 2006 month 10 day03 svn rev39566 language R version.string R version 2.4.0 (2006-10-03) I wanna to call R in shell( bash ) , write all R commands in the shell script and make it a cron job to execute automatically. do you know how to do this ? Looking forward to hearing from you, thanks a million. Tanya Li __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lead
How about revLag - function(x, shift=1) rev( Lag(rev(x), shift) ) x - 1:5 revLag(x, shift=2) As a matter of fact, here is a generalized version of Lag to include negative shifts. myLag - function (x, shift = 1){ xLen - length(x) ret - as.vector(character(xLen), mode = storage.mode(x)) attrib - attributes(x) if (!is.null(attrib$label)) atr$label - paste(attrib$label, lagged, shift, observations) if (shift == 0) return(x) if( xLen = abs(shift) ) return(ret) if (shift 0) x - rev(x) retrange = 1:abs(shift) ret[-retrange] - x[1:(xLen - abs(shift))] if (shift 0) ret - rev(ret) attributes(ret) - attrib return(ret) } and some test examples: myLag(1:5, shift=2) [1] NA NA 1 2 3 myLag(letters[1:4], shift=2) [1] a b myLag(factor(letters[1:4]), shift=2) [1] NA NA ab Levels: a b c d myLag(1:5, shift=-2) [1] 3 4 5 NA NA myLag(letters[1:4], shift=-2) [1] c d myLag(factor(letters[1:4]), shift=-2) [1] cdNA NA Levels: a b c d Regards, Adai Aydemir, Zava (FID) wrote: Hi, is there any function in R that shifts elements of a vector to the opposite direction of what Lag() of the Hmisc package does? (something like, Lag(x, shift = -1) ) Thanks Zava This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed and looping issues; calculations on big datasets
I don't fully understand what your objective here, but I would try a combination of cut and grep in a shell to see if it works. For example, if your data was saved as a tab-delimited file and you have some predefined patterns you seek, then try the untested code below cut -f3-6 | gsub 's/ //g' tmp grep ^00 tmp | wc rightA grep ^001 tmp | wc rightB grep ^010|^0011 tmp | wc rightC cut -f1-3 | | gsub 's/ //g' grep 00$ | wc leftA grep 000$|001$ | wc leftB Then you got to write a loop and generalise the codes. You can try this in bash, perl or rewrite it in C. If you want more help, the provide more explanation on what the types of pattern you are looking for. You might want to try checking the BioConductor packages as well. Regards, Adai martin sikora wrote: dear r users, i'm a little stuck with the following problem(s), hopefully somebody can offer some help: i have data organized in a binary matrix, which can become quite big like 60 rows x 10^5 columns (they represent SNP genotypes, for some background info). what i need to do is the following: let's suppose i have a matrix of size n x m. for each of the m columns, i want to know the counts of unique rows extended one by one from the core column, for both values at the core separately and in both directions. maybe better explained with a little example. data: 00 0 010 10 1 001 11 1 011 10 0 011 10 0 010 so the extended unique rows counts taking e.g. column 3 as core are: col 3 = 0: right: patterns / counts 00 / 3 001 / 3 010, 0011 / 2,1 left: 00 / 3 000,001 / 1,2 and that for the other subset ( col3 = 1) as well, then doing the whole thing again for the next core column. the reason i need this counts is that i want to calculate frequencies of the different extended sequences to calculate the probability of drawing two identical sequences from the core up to an extended position from the whole set of sequences. my main problem is speed of the calculations. i tried different ways suggested here in the list of getting the counts of the unique rows, all of them using the table function. both a combination of table ( do.call( paste, c( as.data.frame( mymatrix) ) ) ) or table( apply ( mymatrix , 2 , paste , collapse = ) ) work fine, but are too slow for bigger matrices that i want to calculate (at least in my not very sophisticated function). then i found a great suggestion here to do a matrix multiplication with a vector of 2^(0:ncol-1) to convert each row into a decimal number, and do table on those. this speeds up things quite nicely, although the problem is that it of course does not work as soon as i extended for more than 60 columns, because the decimal numbers get to large to accurately distinguish between a 0 and 1 at the smallest digit: 2^60+2 == 2^60 [1] TRUE another thing is that so far i could not come up with an idea on how or if it is possible to do this without the loops i am using, one large loop for each column in turn as core, and then another loop within that extends the rows by growing column numbers. since i am not the best of programmers (and still quite new to R), i was hoping that somebody has some advice on doing this calculations in a more elegant and more importantly, fast way. just to get the idea, the approach with the matrix multiplication takes 20s for a 60 x 220 matrix on my macbook pro, which is obviously not perfect, considering i would like to use this function for matrices of size 10^2 x 10^5 or even more. so i would be very thankful for any ideas, suggestions etc to improve this cheers martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling question
Lets assume your zcta data looks like this set.seed(12345) ## temporary for reproducibility zcta - data.frame( zipcode=LETTERS[1:5], prop=runif(5) ) zcta zipcode prop 1 A 0.7209039 2 B 0.8757732 3 C 0.7609823 4 D 0.8861246 5 E 0.4564810 This says that 72.1% of the population in zipcode A is female, ..., and 45.6% in zipcode E is female. Now suppose you sampled 20 people and you recorded the zipcode (and other variables) and stored in 'samp' samp - data.frame( id=1:20, zipcode=LETTERS[ sample(1:5, 20, replace=TRUE) ]) Now, I am not sure what you want to do. But I could see two possible meanings from your message. 1) If you want to sample 10 observation, with each observation weighted INDEPENDENTLY by the proportion of women in its zipcode, try something like the following. The problem with this option is that it depends on the prevalence of the zipcodes of the observations. comb - merge( samp, zcta, all.x=T ) comb - comb[ order(comb$id), ] comb[ sample( comb$id, 10, prob=comb$prop ), ] 2) If you want to sample x% in each zipcode, where x is the proportion of women in that zipcode. Then this is what I would call stratified sampling. Try this: tmp - split( samp, samp$zipcode ) out - NULL for( z in names(tmp) ){ df - tmp[[z]] p - zcta[ zcta$zipcode == z, prop ] out[[z]] - df[ sample( 1:nrow(df), p*nrow(df) ), ] } do.call(rbind, out) You probably need a variant of these but if you need further help, you will need to provide more information and better yet examples. Regards, Adai Kirsten Beyer wrote: I am interested in locating a script to implement a sampling scheme that would basically make it more likely that a particular observation is chosen based on a weight associated with the observation. I am trying to select a sample of ~30 census blocks from each ZIP code area based on the proportion of women in a ZCTA living in a particular block. I want to make it more likely that a block will be chosen if the proportion of women in a patient's age group in a particular block is high. Any ideas are appreciated! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.frame
See help(dim) and please read the manuals before asking basic questions like this. Thank you. elyakhlifi mustapha wrote: hello, are there functions giving the columns number and the rows number of a matrix? thanks. _ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make a table of a desired dimension
You need to basically use table on factors with fixed pre-specified levels. For example: x - c(runif(100,10,40), runif(100,43,55)) y - c(runif(100,7,35), runif(100,37,50)) z - c(runif(100,10,42), runif(100,45,52)) xx - ceiling(x); yy - ceiling(y); zz - ceiling(z) mylevels - min( c(xx, yy, zz) ) : max( c(xx, yy, zz) ) out - cbind( table( factor(xx, levels=mylevels) ), table( factor(yy, levels=mylevels) ), table( factor(zz, levels=mylevels) ) ) You could replace the last command with simply sapply( list(xx, yy, zz), function(vec) table( factor(vec, levels=mylevels) ) ) Regards, Adai Rubén Roa-Ureta wrote: Hi ComRades, I want to make a matrix of frequencies from vectors of a continuous variable spanning different values. For example this code x-c(runif(100,10,40),runif(100,43,55)) y-c(runif(100,7,35),runif(100,37,50)) z-c(runif(100,10,42),runif(100,45,52)) a-table(ceiling(x)) b-table(ceiling(y)) c-table(ceiling(z)) a b c will give me three tables that start and end at different integer values, and besides, they have 'holes' in between different integer values. Is it possible to use 'table' to make these three tables have the same dimensions, filling in the absent labels with zeroes? In the example above, the desired tables should all start at 8 and tables 'a' and 'c' should put a zero at labels '8' to '10', should all put zeros in the frequencies of the labels corresponding to the holes, and should all end at label '55'. The final purpose is the make a matrix and use 'matplot' to plot all the frequencies in one plot, such as #code valid only when 'a', 'b', and 'c' have the proper dimension p-mat.or.vec(48,4) p[,1]-8:55 p[,2]-c(matrix(a)[1:48]) p[,3]-c(matrix(b)[1:48]) p[,4]-c(matrix(c)[1:48]) matplot(p) I read the help about 'table' but I couldn't figure out if dnn, deparse.level, or the other arguments could serve my purpose. Thanks for your help Rubén __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Choosing a column for analysis in a function
Perhaps the use of as.character() like following might help? data.whole$Analyte.Values - data.whole$as.character(analyte) Junnila, Jouni wrote: Hello all, I'm having a problem concerning choosing columns from a dataset in a function. I'm writing a function for data input etc., which first reads the data, and then does several data manipulation tasks. The function can be then used, with just giving the path of the .txt file where the data is being held. These datasets consists of over 20 different analytes. Though, statistical analyses should be made seperately analyte by analyte. So the function needs to be able to choose a certain analyte based on what the user of the function gives as a parameter when calling the function. The name of the analyte user gives, is the same as a name of a column in the data set. The question is: how can I refer to the parameter which the user gives, inside the function? I cannot give the name of the analyte directly inside the function, as the same function should work for all the 20 analytes. I'm giving some code for clarification: datainput - function(data1,data2,data3,data4,data5,data6,analyte) { ... ##data1-data6 being the paths of the six datasets I want to combine and analyte being the special analyte I want to analyze and which can be found on each of the datasets as a columnname.## ##Then:## ... data.whole - subset(data.whole, select=c(Sample.Name,Analyte.Values,Day,Plate)) ##Is for choosing the columns needed for analysis. The Analyte should now be the column of the analyte, the users is referring to when calling the datainput-function. How to do it? ## I've tried something like data.whole$Analyte.Values - data.whole$analyte ##(Or in quotes analyte) But this does not work. I've tried several other tricks also, but cannot get it to work. Can someone help? Thanks in advance, Jouni __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Venn diagram
I cannot find the venn package (searched the author's page and googled) despite some posts referring to it, so I cannot help you. But I can suggest you check out the varpart in vegan package, vennDiagram in limma package or http://finzi.psych.upenn.edu/R/Rhelp02a/archive/14637.html Regards, Adai Nina Hubner wrote: Hello, I am a total beginner with “R” and found a package “venn” to create a venn diagram. The problem is, I cannot create the vectors required for the diagram. The manual say: R venn(accession, libname, main = All samples) where accession was a vector containing the codes identifying the RNA sequences, and libname was a vector containing the codes identifying the tissue sample (library). The structure of my data is as follows: R structure(list(cyto = c(A, “B”, “C”, “D”), nuc = c(“A”, “B”, “E”, “”), chrom = c(“B”, “F”, “”, “”)),.Names = c(cyto, Nuc, chrom)) accession should be A, B, and libname schould be cyto, nuc and chrom as I understand it... Could you help me? Sorry, that might be a very simple question, but I am a total beginner as said before! The question has already been asked, but unfortunately there was no answer... Thank you a lot, Nina Hubner __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chosing a subset of a non-sorted vector
You want to select two subplots for each DL value. Try: df - data.frame( DL=gl(3,4), subplot=rep(1:4,3) ) df$index - 1:nrow(df) ind - tapply( df$index, df$DL, function(x) sample(x,2) ) df[ unlist(ind), ] You could also have used rownames(df) instead of creating df$index. OR tmp - lapply( split(df, df$DL), function(m) m[sample(1:nrow(m),2),] ) do.call(rbind, tmp) Regards, Adai Christoph Scherber wrote: Dear all, I have a tricky problem here: I have a dataframe with biodiversity data in which suplots are a repeated sequence from 1 to 4 (1234,1234,...) Now, I want to randomly pick two subplots each from each diversity level (DL). The problem is that it works up to that point - but if I try to subset the whole dataframe, I get stuck: DL=gl(3,4) subplot=rep(1:4,3) diversity.data=data.frame(DL,subplot) subplot.sampled=NULL for(i in 1:3) subplot.sampled=c(subplot.sampled,sort(sample(4,2,replace=F))) subplot.sampled [1] 3 4 1 3 1 3 subplot[subplot.sampled] [1] 3 4 1 3 1 3 ## here comes the tricky bit: diversity.data[subplot.sampled,] DL subplot 31 3 41 4 11 1 3.1 1 3 1.1 1 1 3.2 1 3 How can I select those rows of diversity.data that match the exact subplots in subplot.sampled? Thank you very much for your help! Best wishes, Christoph (I am using R 2.4.1 on Windows XP) ## Christoph Scherber DNPW, Agroecology University of Goettingen Waldweg 26 D-37073 Goettingen +49-(0)551-39-8807 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing packages from command line on Linux RHEL4
Assuming the R packages have been downloaded locally and end with tar.gz, then how about simply changing to where the files are located and typing the following command? ls *.tar.gz | while read x; do echo R CMD INSTALL $x; done | bash Alternatively, you can use the install.packages() function in R. Regards, Adai Kermit Short wrote: Dirk- Many thanks for your reply. As I mentioned, I know very little about programming in 'R' and what I've got is a BASH script. If needs be, I'll look up how to read in a text file through R and add that into your script in lieu of the (argv) stuff, but you wouldn't happen to know how to accomplish the same thing using the R CMD INSTALL Shell command? Thanks! -Kermit -Original Message- From: Dirk Eddelbuettel [mailto:[EMAIL PROTECTED] Sent: Monday, May 21, 2007 12:00 PM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Installing packages from command line on Linux RHEL4 Hi Kernit, On 21 May 2007 at 11:37, Kermit Short wrote: | Greetings. | |I am a System Administrator, and thus have very little knowledge of R | itself. I have been asked to install a list of some 200 packages (from | CRAM) to R. Rather than installing each package manually, I was hoping I | could script this. I've written a BASH script that hopefully will do this, | but I'm wondering about the Mirror Selection portion of the installation | process. I've looked and can't find anywhere a parameter to supply that | specifies a mirror to use so that I don't have to manually select it for | each package I want to install. In this case, with nearly 200 packages to | install, this could become quite tedious. Does anyone have any | suggestions? The narrow answer is try adding repos=http://cran.us.r-project.org; Also, and if I may, the littler front-end (essentially #! shebang support for R) helps there: basebud:~ cat bin/installPackages.r #!/usr/bin/env r # # a simple example to install all the listed arguments as packages if (is.null(argv)) { cat(Usage: installPackages.r pkg1 [pkg2 [pkg3 [...]]]\n) q() } for (pkg in argv) { install.packages(pkg, lib=/usr/local/lib/R/site-library, depend=TRUE) } You would still need to add repos=... there. I tend to do that in my ~/.Rprofile. Hth, Dirk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] translate SAS code
I am not sure if R can read formulas and if it does, it probably as characters. I would suggest you Copy and Paste Special (as values) onto a new sheet and save it a tab delimited files. elyakhlifi mustapha wrote: good morning, I have some SAS code to translate in R code and when I export data from Excel to R I have to read formula writed as follow C604=(C181/S181)*(100-C182)*(100/85) or if C325=. then C740=(C346/C103)*100| else C740=(C346/C325)*100 I find some difficulties to write a good program to read and calculate these formulas there are several kinds of formulas there are with conditional and without conditional can you help me please? thanks. _ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with this indexing
merge() javier garcia-pintado wrote: Hi all, Let's say I have a long data frame and a short one, both with three colums: $east, $north, $value And I need to fill in the short$value, extracting the corresponding value from long$value, for coinciding $east and $north in both tables. I know the possibility: for (i in 1:length(short$value)){ short$value[i] - long$value[long$east==short$east long$north==short$north] } How could I avoid this loop? Thanks and regards, Javier -- Javier García-Pintado Institute of Earth Sciences Jaume Almera (CSIC) Lluis Sole Sabaris s/n, 08028 Barcelona Phone: +34 934095410 Fax: +34 934110012 e-mail:[EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] density
Try bkde2D {KernSmooth} or kde2d {MASS}. Bruce Willy wrote: Hello, I have a n*2 matrix, called plan, which contains n observations from 2 variates. I want a kernel density estimate of the joint distribution of these 2 variates. I try : density(plan). Unfortunately, R thinks there is 2n observations (if n=10, 20 observations), where there is only n. How to to make a multivariate kernel density estimate ? Thank you very much. _ météo et bien plus encore ! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optional fields in function declarations
Can you provide an simple example of what you want the function to do? Generally, I set some value in the default. raise - function(x, power=1){ return( x^power ) } raise(5) [1] 5 raise(5,3) [1] 125 Or you can do the same but in a slightly unclear manner. raise - function(x, power){ if(missing(power)) power - 1 return( x^power ) } I prefer the former. Regards, Adai [EMAIL PROTECTED] wrote: Dear R users, I need to create a set of function to solve some tasks. I want to leave the operator to decide whether uses default parameters or change it; so the functions may have some optional fields. I tied to use the function missing(), but it will work properly only if the optional field is decleared at last in the function. Can you give me some suggestion an some reference? thank you. Claudio -- Passa a Infostrada. ADSL e Telefono senza limiti e senza canone Telecom http://click.libero.it/infostrada __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a multivariate set of variables with given intercorrelations
I presume you want to generate normally or t-distributed values ? If so either have a look mvrnorm in the MASS package or the mvtnorm package. Dimitri Liakhovitski wrote: Hi! I was wondering if there is a package in R that allows one to create a multivariate data set with pre-specified intercorrelations among variables, e.g., a set of 4 variables (with a length of N each), such that the correlations between variables are: a b c d a 1 r1r2r3 b 1 r4r5 c 1 r6 d 1 Thank you very much! Dimitri Liakhovitski __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple programming question
According to your post you are assuming that there are only 3 unique values for var3 within each category. But category C and D have 4 unique values for var3. split(dfr, dfr$categ) ... $C id categ var3 score 3 3 C6 high 7 7 C5 mid 11 11 C3 low 15 15 C1 low ... If you meant something different, then just change myfun() below gmax - function(x, rnk=1){ ## generalized maximum with rnk=1 being the bigest value (i.e. max) return( sort( unique(x), decreasing=T )[rnk] ) } myfun - function(x){ ifelse( x==gmax(x,1), high, ifelse( x==gmax(x,2), med, low ) ) } out - lapply( split(dfr$var3, dfr$categ), myfun ) data.frame( dfr, my.score = unsplit(out, dfr$categ) ) Regards, Adai Lauri Nikkinen wrote: Hi R-users, I have a simple question for R heavy users. If I have a data frame like this dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr - dfr[order(dfr$categ),] and I want to score values or points in variable named var3 following this kind of logic: 1. the highest value of var3 within category (variable named categ) - high 2. the second highest value - mid 3. lowest value - low This would be the output of this reasoning: dfr$score - factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low)) dfr The question is how I do this programmatically in R (i.e. if I have 2000 rows in my dfr)? I appreciate your help! Cheers, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MICE for Cox model
I encountered this problem about 18 months ago. I contacted Prof. Fox and Dr. Malewski (the R package maintainers for mice) but they referred me to Prof. van Buuren. I wrote to Prof. van Buuren but am unable to find his reply (if he did reply). Here are the functions I used at that time, if you want to take it with lots of salt. Let me know if you find anything fishy with it. coxph.mids - function (formula, data, ...) { call - match.call() if (!is.mids(data)) stop(The data must have class mids) analyses - as.list(1:data$m) for (i in 1:data$m) { data.i- complete(data, i) analyses[[i]] - coxph(formula, data = data.i, ...) } object - list(call = call, call1 = data$call, nmis = data$nmis, analyses = analyses) oldClass(object) - if (.SV4.) mira else c(mira, coxph) return(object) } And in the function 'pool', the small sample adjustment requires residual degrees of freedom (i.e. dfc). For a cox model, I believe that this is simply the number of events minus the regression coefficients. There is support for this from middle of page 149 of the book by Parmer Machin (ISBN 0471936405). Please correct me if I am wrong. Here is the slightly modified version of pool : pool - function (object, method = smallsample) { call - match.call() if (!is.mira(object)) stop(The object must have class 'mira') if ((m - length(object$analyses)) 2) stop(At least two imputations are needed for pooling.\n) analyses - object$analyses k - length(coef(analyses[[1]])) names - names(coef(analyses[[1]])) qhat - matrix(NA, nrow = m, ncol = k, dimnames = list(1:m, names)) u - array(NA, dim = c(m, k, k), dimnames = list(1:m, names, names)) for (i in 1:m) { fit - analyses[[i]] qhat[i, ] - coef(fit) u[i, , ] - vcov(fit) } qbar - apply(qhat, 2, mean) ubar - apply(u, c(2, 3), mean) e - qhat - matrix(qbar, nrow = m, ncol = k, byrow = TRUE) b - (t(e) %*% e)/(m - 1) t - ubar + (1 + 1/m) * b r - (1 + 1/m) * diag(b/ubar) f - (1 + 1/m) * diag(b/t) df - (m - 1) * (1 + 1/r)2 if (method == smallsample) { if( any( class(fit) == coxph ) ){ ### this loop is the hack for survival analysis ### status - fit$y[ , 2] n.events - sum(status == max(status)) p- length( coefficients( fit ) ) dfc - n.events - p } else { dfc - fit$df.residual } df - dfc/((1 - (f/(m + 1)))/(1 - f) + dfc/df) } names(r) - names(df) - names(f) - names fit - list(call = call, call1 = object$call, call2 = object$call1, nmis = object$nmis, m = m, qhat = qhat, u = u, qbar = qbar, ubar = ubar, b = b, t = t, r = r, df = df, f = f) oldClass(fit) - if (.SV4.) mipo else c(mipo, oldClass(object)) return(fit) } print.miro only gives the coefficients. Often I need the standard errors as well since I want to test if each regression coefficient from multiple imputation is zero or not. Since the function summary.mipo does not exist, can I suggest the following summary.mipo - function(object){ if (!is.null(object$call1)){ cat(Call: ) dput(object$call1) } est - object$qbar se - sqrt(diag(object$t)) tval - est/se df - object$df pval - 2 * pt(abs(tval), df, lower.tail = FALSE) coefmat - cbind(est, se, tval, pval) colnames(coefmat) - c(Estimate, Std. Error, t value, Pr(|t|)) cat(\nCoefficients:\n) printCoefmat( coefmat, P.values=T, has.Pvalue=T, signif.legend=T ) cat(\nFraction of information about the coefficients missing due to nonresponse:, \n) print(object$f) ans - list( coefficients=coefmat, df=df, call=object$call1, fracinfo.miss=object$f ) invisible( ans ) } Hope this helps. Regards, Adai Inman, Brant A. M.D. wrote: R-helpers: I have a dataset that has 168 subjects and 12 variables. Some of the variables have missing data and I want to use the multiple imputation capabilities of the mice package to address the missing data. Given that mice only supports linear models and generalized linear models (via the lm.mids and glm.mids functions) and that I need to fit Cox models, I followed the previous suggestion of John Fox and have created my own function cox.mids to use coxph to fit models to the imputed datasets. (http://tolstoy.newcastle.edu.au/R/help/06/03/22295.html) The function I created is: cox.mids - function (formula, data, ...) { call - match.call() if (!is.mids(data)) stop(The data must have class mids) analyses - as.list(1:data$m) for (i in 1:data$m) { data.i - complete(data, i) analyses[[i]] - coxph(formula, data = data.i, ...) } object - list(call = call, call1 = data$call, nmis
Re: [R] controling the size of vectors in a matrix
1) Your colnames need 4 elements and not 3 2) Utilize the argument 'n' in your random number generators Your codes could be simplified as: m - cbind( treatmentgrp = sample( 1:2, n, replace=T ), strata= sample( 1:2, n, replace=T ), survivalTime = rexp( n, rate=0.07 ), somethingElse = rexp( n, rate=0.02 ) ) Regards, Adai raymond chiruka wrote: hie R users l have the following matrix n=20 m-matrix(nrow=n,ncol=4) colnames(m)=c(treatmentgrp,strata,survivalTime) for(i in 1:n) m[i,]-c(sample(c(1,2),1,replace=TRUE),sample(c(1:2),1,replace=TRUE),rexp(1,0.07),rexp(1,0.02)) print(m) 1.l would like to control the size of the treatment variable eg treatment 1=size 5 treatment 2=size 15. 2. l would also want to control the size of the strata ie in treatment 1 divide the strata in to 2 etc. 3. For the survival time l would like to have treatment 1-strata 1 using a different rate from treatment 2 -strata 2 etc to generate the survival time. the program l used above does nt do this so if you can help thanks - Building a website is a piece of cake. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating columns
See my response to your thread controling the size of vectors in a matrix. Please do not create multiple threads on the same day asking basically the same question, especially if you cannot substantially improve the clarity and quality of the post. Multiple threads asking the same question badly within the span of few hours leads to people missing out on other people's response and thereby essentially wasting their time. raymond chiruka wrote: l would like to create the following matrice treatmentgrpstrata 11 11 11 12 12 12 21 21 21 22 22 22 l should be able to choose the size of the treatment grps and stratas the method l used intially creates the matrice randomly n=20 m - cbind( treatmentgrp = sample( 1:2,n, replace=T ), strata= sample( 1:2, n, replace=T ), survivalTime = rexp( n, rate=0.07 ), somethingElse = rexp( n, rate=0.02 ) thanks - Give spam the boot. Take control with tough spam protection [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MICE for Cox model
Are you sure you used my pool function? Because as I just have discovered, it had a minor typo in the code. After replacing df - (m - 1) * (1 + 1/r)2 with df - (m - 1) * (1 + 1/r)^2 in my pool() function, I get library(survival); data(pbc) d - pbc[,c('time', 'status', 'age', 'sex', 'hepmeg', 'platelet', 'trt', 'trig')] d[d==-9] - NA d[,c(4,5,7)] - lapply(d[,c(4,5,7)], FUN=as.factor) library(mice) imp - mice(d, m=10, maxit=10, diagnostics=T, seed=500, defaultImputationMethod=c('norm', 'logreg', 'polyreg')) fit - coxph.mids( Surv(time,status) ~ age + sex + hepmeg + platelet + trt + trig, imp) pool(fit) Call: pool(object = fit) Pooled coefficients: age sex1 hepmeg1 platelet trt2 trig 0.034924182 -0.208897827 0.987641362 -0.001559426 0.070124108 0.004122421 Fraction of information about the coefficients missing due to nonresponse: age sex1hepmeg1 platelet trt2 trig 0.06624167 0.19490517 0.27300965 0.21950332 0.27768153 0.40658964 Regards, Adai Inman, Brant A. M.D. wrote: Adai, Thanks for the functions. I tried using your functions and I get the same error message during the pooling part: pool(micefit) Error in names(df) - names(f) - names : 'names' attribute [5] must be the same length as the vector [0] Brant -Original Message- From: Adaikalavan Ramasamy [mailto:[EMAIL PROTECTED] Sent: Thursday, May 17, 2007 4:56 AM To: Inman, Brant A. M.D. Cc: r-help@stat.math.ethz.ch Subject: Re: [R] MICE for Cox model I encountered this problem about 18 months ago. I contacted Prof. Fox and Dr. Malewski (the R package maintainers for mice) but they referred me to Prof. van Buuren. I wrote to Prof. van Buuren but am unable to find his reply (if he did reply). Here are the functions I used at that time, if you want to take it with lots of salt. Let me know if you find anything fishy with it. coxph.mids - function (formula, data, ...) { call - match.call() if (!is.mids(data)) stop(The data must have class mids) analyses - as.list(1:data$m) for (i in 1:data$m) { data.i- complete(data, i) analyses[[i]] - coxph(formula, data = data.i, ...) } object - list(call = call, call1 = data$call, nmis = data$nmis, analyses = analyses) oldClass(object) - if (.SV4.) mira else c(mira, coxph) return(object) } And in the function 'pool', the small sample adjustment requires residual degrees of freedom (i.e. dfc). For a cox model, I believe that this is simply the number of events minus the regression coefficients. There is support for this from middle of page 149 of the book by Parmer Machin (ISBN 0471936405). Please correct me if I am wrong. Here is the slightly modified version of pool : pool - function (object, method = smallsample) { call - match.call() if (!is.mira(object)) stop(The object must have class 'mira') if ((m - length(object$analyses)) 2) stop(At least two imputations are needed for pooling.\n) analyses - object$analyses k - length(coef(analyses[[1]])) names - names(coef(analyses[[1]])) qhat - matrix(NA, nrow = m, ncol = k, dimnames = list(1:m, names)) u - array(NA, dim = c(m, k, k), dimnames = list(1:m, names, names)) for (i in 1:m) { fit - analyses[[i]] qhat[i, ] - coef(fit) u[i, , ] - vcov(fit) } qbar - apply(qhat, 2, mean) ubar - apply(u, c(2, 3), mean) e - qhat - matrix(qbar, nrow = m, ncol = k, byrow = TRUE) b - (t(e) %*% e)/(m - 1) t - ubar + (1 + 1/m) * b r - (1 + 1/m) * diag(b/ubar) f - (1 + 1/m) * diag(b/t) df - (m - 1) * (1 + 1/r)2 if (method == smallsample) { if( any( class(fit) == coxph ) ){ ### this loop is the hack for survival analysis ### status - fit$y[ , 2] n.events - sum(status == max(status)) p- length( coefficients( fit ) ) dfc - n.events - p } else { dfc - fit$df.residual } df - dfc/((1 - (f/(m + 1)))/(1 - f) + dfc/df) } names(r) - names(df) - names(f) - names fit - list(call = call, call1 = object$call, call2 = object$call1, nmis = object$nmis, m = m, qhat = qhat, u = u, qbar = qbar, ubar = ubar, b = b, t = t, r = r, df = df, f = f) oldClass(fit) - if (.SV4.) mipo else c(mipo, oldClass(object)) return(fit) } print.miro only gives the coefficients. Often I need the standard errors as well since I want to test if each regression coefficient from multiple imputation is zero or not. Since the function summary.mipo does not exist, can I suggest the following summary.mipo - function(object){ if (!is.null(object$call1)){ cat(Call: ) dput
Re: [R] Split a vector(list) into 3 list
Don't need to upgrade R just to get index() working. You can try the following modification. v - sample(1:3, 30, replace = TRUE) split( 1:length(v), v ) Should do the trick. Check out the reverse function unsplit(). Regards, Adai Leeds, Mark (IED) wrote: index is definitely defined in my version ( 2.4.0) because when I do ?index, I get info. Maybe you Are using an older or younger version of R ? I'm really not sure why you are experiencing that problem. -Original Message- From: Patrick Wang [mailto:[EMAIL PROTECTED] Sent: Thursday, May 17, 2007 8:44 PM To: Leeds, Mark (IED) Cc: Patrick Wang; r-help@stat.math.ethz.ch Subject: RE: [R] Split a vector(list) into 3 list Thanks, no index function was defined in R. I try to use the split(order(temp), temp), the number of groups are correct, however the result doesnot seem to be correct. I try to match before the ordered index and the original index. Pat If temp is your vector then split(index(temp),temp) will give you what you want. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Wang Sent: Thursday, May 17, 2007 8:15 PM To: r-help@stat.math.ethz.ch Subject: [R] Split a vector(list) into 3 list Hi, I have a vector contains values 1,2,3. Can I call a function split to split it into 3 vectors with 1 corresponds to value ==1, which contain all the indexes for value==1. 2 corresponds to value ==2 which contain all the indexes for value=2 Thanks pat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This is not an offer (or solicitation of an offer) to buy/sell the securities/instruments mentioned or an official confirmation. Morgan Stanley may deal as principal in or own or act as market maker for securities/instruments mentioned or may advise the issuers. This is not research and is not from MS Research but it may refer to a research analyst/research report. Unless indicated, these views are the author's and may differ from those of Morgan Stanley research or others in the Firm. We do not represent this is accurate or complete and we may not update this. Past performance is not indicative of future returns. For additional information, research reports and important disclosures, contact me or see https://secure.ms.com/servlet/cls. You should not use e-mail to request, authorize or effect the purchase or sale of any security or instrument, to send transfer instructions, or to effect any other transactions. We cannot guarantee that any such requests received via e-mail will be processed in a timely manner. This communication is solely for the addressee(s) and may contain confidential information. We do not waive confidentiality by mistransmission. Contact me if you do not wish to receive these communications. In the UK, this communication is directed in the UK to those persons who are market counterparties or intermediate customers (as defined in the UK Financial Services Authority's rules). This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use loop or use apply?
Can you check if the following gives you what you want? tmp - rbind( A, B ) dis - dist( tmp ) nA - nrow(A) nB - nrow(B) dis[ 1:nA, nA + 1:nB ] ## output If it works, this suggestion comes with the caveat that it might be computationally inefficient compared with using for() loops for very large values of (a,b) or highly discordant values of (a,b). However I am hoping the gain from dist() being coded in C should offset it. Try experimenting to find the optimal speed etc. Also have a look at mapply() examples to see if they are useful. Regards, Adai Prasenjit Kapat wrote: Hi, I have two matrices, A (axd) and B (bxd). I want to get another matrix C (axb) such that, C[i,j] is the Euclidean distance between the ith row of A and jth row of B. In general, I can say that C[i,j] = some.function (A[i,], B[j,]). What is the best method for doing so? (assume a b) I have been doing some exploration myself: Consider the following function: get.f, in which, 'method=1' is the rudimentary double for loop; 'method=2' avoids one loop by constructing a bigger matrix, but doesn't use apply(); 'method=3' avoids both the loops by using apply() and constructing bigger matrices; 'method=4' avoids constructing bigger matrices by using apply() twice. get.f - function (A, B, method=2) { if (method == 1){ a - nrow(A); b - nrow(B); C - matrix(NA, nrow=a, ncol=b); for (i in 1:a) for (j in 1:b) C[i,j] - sum((A[i,]-B[j,])^2) } else if (method == 2 ) { a - nrow(A); b - nrow(B); d - ncol(A); C - matrix(NA, nrow=a, ncol=b); for (i in 1:a) C[i,] - rowSums((matrix(A[i,], nrow=b, ncol=d, byrow=TRUE) - B) ^ 2) } else if (method == 3) { C - t(apply(A, MARGIN=1, FUN=FUN1, BB=B)); # transpose is needed } else if (method == 4) { C - t(apply(A, MARGIN=1, FUN=FUN2, BB=B)) } } FUN1 - function(aa, BB) return(rowSums( (matrix(aa, nrow=nrow(BB), ncol=ncol(BB), byrow=TRUE) - BB)^2) ) FUN2 - function(aa, BB) return(apply(BB, MARGIN=1, FUN=FUN3, aa=aa)) FUN3 - function(bb,aa) return(sum((aa-bb)^2)) ### With these methods and the following intitializations, a - 100; b - 1000; d - 100; n.loop - 20; A - matrix(rnorm(a*d), ncol=d) B - matrix(rnorm(b*d), ncol=d) all.times - matrix(0,nrow=5,ncol=4) rownames(all.times) - rownames(as.matrix(system.time(NULL))) for (i in 1:4) for (j in 1:n.loop) all.times[,i] - all.times[,i] + as.matrix(system.time(C - get.f(A=A, B=B, method=i))) all.times - all.times / n.loop print(all.times) [,1][,2][,3][,4] user.self 4.0554 1.50010 1.50130 4.51285 sys.self 0.0370 0.02420 0.01800 0.04260 elapsed4.2705 1.58865 1.59475 6.07535 user.child 0. 0.0 0.0 0.0 sys.child 0. 0.0 0.0 0.0 'method=2' stands out be the best and 'method=1' (for loops) beats 'method=4' (two apply()s)... Is that expected? Is it possible to improve over 'method=2'? Thanks PK PS: The mail text seems fine in my composer, I hope, it looks decent in your reader. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted least squares
See below. hadley wickham wrote: Dear all, I'm struggling with weighted least squares, where something that I had assumed to be true appears not to be the case. Take the following data set as an example: df - data.frame(x = runif(100, 0, 100)) df$y - df$x + 1 + rnorm(100, sd=15) I had expected that: summary(lm(y ~ x, data=df, weights=rep(2, 100))) summary(lm(y ~ x, data=rbind(df,df))) You assign weights to different points according to some external quality or reliability measure not number of times the data point was measured. Look at the estimates and standard error of the two models below: coefficients( summary(f.w - lm(y ~ x, data=df, weights=rep(2, 100))) ) Estimate Std. Error t value Pr(|t|) (Intercept) 1.940765 3.30348066 0.587491 5.582252e-01 x 0.982610 0.05893262 16.673448 2.264258e-30 coefficients( summary( f.u - lm(y ~ x, data=rbind(df,df) ) ) ) Estimate Std. Errort value Pr(|t|) (Intercept) 1.940765 2.32408609 0.8350659 4.046871e-01 x 0.982610 0.04146066 23.6998165 1.012067e-59 You can see that they have same coefficient estimates but the second one has smaller variances. The repeated values artificially deflates the variance and thus inflates the precision. This is why you cannot treat replicate data as independent observations. would be equivalent, but they are not. I suspect the difference is how the degrees of freedom is calculated - I had expected it to be sum(weights), but seems to be sum(weights 0). This seems unintuitive to me: summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50))) summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50))) What am I missing? And what is the usual way to do a linear regression when you have aggregated data? I would be best to use the individual data points instead of aggregated data as it allows you to estimate the within-group variations as well. If you had individual data points, you could try something as follows. Please check the codes as I am no expert in the area of repeated measures. x - runif(100, 0, 100) y1 - x + rnorm(100, mean=1, sd=15) y2 - y1 + rnorm(100, sd=5) df - data.frame( y=c(y1, y2), x=c(x,x), subject=factor(rep( paste(p, 1:100, sep=), 2 ) )) library(nlme) summary( lme( y ~ x, random = ~ 1 | subject, data=df ) ) Try reading Pinheiro and Bates (http://tinyurl.com/yvvrr7) or related material for more information. Hope this helps. Thanks, Hadley Regards, Adai __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted least squares
Sorry, you did not explain that your weights correspond to your frequency in the original post. I assumed they were repeated measurements with within group variation. I was merely responding to your query why the following differed. summary(lm(y ~ x, data=df, weights=rep(2, 100))) summary(lm(y ~ x, data=rbind(df,df))) Let me also clarify my statement about artificial. If one treats repeated observations as independent, then they obtain estimates with inflated precision. I was not calling your data artificial in any way. Using frequency as weights may be valid. Your data points appear to arise from discrete distribution, so I am not entirely sure if you can use the linear model which assumes the errors are normally distributed. Regards, Adai hadley wickham wrote: On 5/8/07, Adaikalavan Ramasamy [EMAIL PROTECTED] wrote: See below. hadley wickham wrote: Dear all, I'm struggling with weighted least squares, where something that I had assumed to be true appears not to be the case. Take the following data set as an example: df - data.frame(x = runif(100, 0, 100)) df$y - df$x + 1 + rnorm(100, sd=15) I had expected that: summary(lm(y ~ x, data=df, weights=rep(2, 100))) summary(lm(y ~ x, data=rbind(df,df))) You assign weights to different points according to some external quality or reliability measure not number of times the data point was measured. That is one type of weighting - but what if I have already aggregated data? That is a perfectly valid type of weighting too. Look at the estimates and standard error of the two models below: coefficients( summary(f.w - lm(y ~ x, data=df, weights=rep(2, 100))) ) Estimate Std. Error t value Pr(|t|) (Intercept) 1.940765 3.30348066 0.587491 5.582252e-01 x 0.982610 0.05893262 16.673448 2.264258e-30 coefficients( summary( f.u - lm(y ~ x, data=rbind(df,df) ) ) ) Estimate Std. Errort value Pr(|t|) (Intercept) 1.940765 2.32408609 0.8350659 4.046871e-01 x 0.982610 0.04146066 23.6998165 1.012067e-59 You can see that they have same coefficient estimates but the second one has smaller variances. The repeated values artificially deflates the variance and thus inflates the precision. This is why you cannot treat replicate data as independent observations. Hardly artificially - I have repeated observations. would be equivalent, but they are not. I suspect the difference is how the degrees of freedom is calculated - I had expected it to be sum(weights), but seems to be sum(weights 0). This seems unintuitive to me: summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50))) summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50))) What am I missing? And what is the usual way to do a linear regression when you have aggregated data? I would be best to use the individual data points instead of aggregated data as it allows you to estimate the within-group variations as well. There is no within group variation - these are observations that occur with same values many times in the dataset, so have been aggregated into the a contingency table-like format. If you had individual data points, you could try something as follows. Please check the codes as I am no expert in the area of repeated measures. x - runif(100, 0, 100) y1 - x + rnorm(100, mean=1, sd=15) y2 - y1 + rnorm(100, sd=5) df - data.frame( y=c(y1, y2), x=c(x,x), subject=factor(rep( paste(p, 1:100, sep=), 2 ) )) library(nlme) summary( lme( y ~ x, random = ~ 1 | subject, data=df ) ) Try reading Pinheiro and Bates (http://tinyurl.com/yvvrr7) or related material for more information. Hope this helps. I'm not interested in a mixed model, and I don't have individual data points. Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted least squares
http://en.wikipedia.org/wiki/Weighted_least_squares gives a formulaic description of what you have said. I believe the original poster has converted something like this y x 0 1.1 0 2.2 0 2.2 0 2.2 1 3.3 1 3.3 2 4.4 ... into something like the following y x freq 0 1.11 0 2.23 1 3.32 2 4.41 ... Now, the variance of means of each row in table above is ZERO because the individual elements that comprise each row are identical. Therefore your method of using inverse-variance will not work here. Then is it valid then to use lm( y ~ x, weights=freq ) ? Regards, Adai S Ellison wrote: Hadley, You asked .. what is the usual way to do a linear regression when you have aggregated data? Least squares generally uses inverse variance weighting. For aggregated data fitted as mean values, you just need the variances for the _means_. So if you have individual means x_i and sd's s_i that arise from aggregated data with n_i observations in group i, the natural weighting is by inverse squared standard error of the mean. The appropriate weight for x_i would then be n_i/(s_i^2). In R, that's n/(s^2), as n and s would be vectors with the same length as x. If all the groups had the same variance, or nearly so, s is a scalar; if they have the same number of observations, n is a scalar. Of course, if they have the same variance and same number of observations, they all have the same weight and you needn't weight them at all: see previous posting! Steve E *** This email and any attachments are confidential. Any use, co...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting a point graph with data in X-axis
R understands only numerical and Date class values for axis. So either a) plot them using the sequence 1, ..., 32 and then explicitly label them. Here is an example: n - length(year.month) plot( 1:n, freq, xaxt=n) mtext( text=year.month, side=1, at=1:n, las=2 ) b) or create the dates in Date format. This option is preferable if the dates were varying unequally. x - seq( as.Date(2000-05-01), as.Date(2002-12-01), by=1 month ) plot(x, simulation$freq) BTW, you could also have created year.month via paste( rep( 2000:2002, c(8,12,12) ), formatC( c(5:12,1:12,1:12), width=2, flag=0 ) , sep=_ ) Regards, Adai Milton Cezar Ribeiro wrote: Dear all, I have two data frame, on with a complete list of my field survey with frequency data of a sample species. This data frame looks like: simulation-data.frame(cbind(my.year=c(rep(2000,8),rep(2001,12),rep(2002,12)),my.month=c(5:12,1:12,1:12))) simulation$year.month-paste(simulation$my.year,_,ifelse(simulation$my.month=10,simulation$my.month,paste(0,simulation$my.month,sep=)),sep=) simulation$freq-sample(1:40,32) attach(simulation) plot(year.month, freq) As you can see, I have a collumn with the year and month of my samples, and a freq variable with simulated data. I would like to plot this data but when I try to use the plot showed above, I get a error message. After bypass this problem, I would like add points in my graph with simulated data for only a random number of survey month, but I need that the full range of surveys be kept on the X-axis. Just to simulate a sample I am using: simulation.sample-simulation[sample(1:length(year.month),8, replace=F),] simulation.sample$freq-sample(1:40,8) Any ideas? Kind regards Miltinho __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data file import - numbers and letters in a matrix(!)
Here is the contents of my testdata.txt : - START OF HEIGHT DATA S= 0y=0.0 x=0. S= 0 y=0.1 x=0.00055643 S= 9 y=4.9 x=1.67278117 S= 9 y=5.0 x=1.74873257 S=10 y=0.0 x=0. S=10y=0.1 x=0.00075557 S=99 y=5.3x=1.94719490 END OF HEIGHT DATA - If you have access to a shell command, you can try changing the input file for read.delim using cat testdata.txt | grep -v ^START | grep -v ^END | sed 's/ //g' | sed 's/S=//' | sed 's/y=/\t/' | sed 's/x=/\t/' or here is my ugly fix in R my.read.file - function(file=file){ v1 - readLines( con=file, n=-1) v2 - v1[ - grep( ^START|^END, v1 ) ] v3 - gsub( , , v2) v4 - gsub( S=|y=|x=, , v3 ) v5 - gsub(^ , , v4) m - t( sapply( strsplit(v5, split= ), as.numeric ) ) colnames(m) - c(S, y, x ) return(m) } my.read.file( testdata.txt ) Regards, Adai Felix Wave wrote: Hello, I have a problem with the import of a date file. I seems verry tricky. I have a text file (end of the mail). Every file has a different number of measurments witch start with START OF HEIGHT DATA and ende with END OF HEIGHT DATA. I imported the file in a matrix but the letters before the numbers are my problem (S= ,S=,x=,y=). Because through the letters and the space after S= I got a different number of columns in my matrix and with letters in my matrix I can't count. My question. Is it possible to import the file to got 3 columns only with numbers and no letters like x=, y=? Thank's a lot Felix My R Code: -- # na.strings = S= Measure1 - matrix(scan(data.dat, n= 5063 * 4, skip = 20, what = character() ), 5063, 3, byrow = TRUE) Measure2 - matrix(scan(data.dat, n= 5063 * 4, skip = 5220, what = character() ), 5063, 3, byrow = TRUE) My data file: --- FILEDATE:02.02.2007 ... START OF HEIGHT DATA S= 0 y=0.0 x=0. S= 0 y=0.1 x=0.00055643 ... S= 9 y=4.9 x=1.67278117 S= 9 y=5.0 x=1.74873257 S=10 y=0.0 x=0. S=10 y=0.1 x=0.00075557 ... S=99 y=5.3 x=1.94719490 END OF HEIGHT DATA ... START OF HEIGHT DATA S= 0 y=0.0 x=0. S= 0 y=0.1 x=0.00055643 The imported matrix: [,1] [,2] [,3] [,4] [6,] S= 9y=4.9x=1.67278117 [7,] S= 9y=5.0x=1.74873257 [8,] S=10 y=0.0x=0. S=10 [9,] y=0.1x=0.00075557 S=10 y=0.2 [10,] x=0.00277444 S=10 y=0.3x=0.00605958 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wikibooks
On a related note, one might be interested in checking out citizendium which is spin off wikipedia but 1) has more stringent identity verification and 2) uses a two-tier system of editors and authors. See http://www.citizendium.org/cfa.html. Deepayan Sarkar wrote: On 3/30/07, Sarah Goslee [EMAIL PROTECTED] wrote: On 3/30/07, Alberto Monteiro [EMAIL PROTECTED] wrote: Deepayan Sarkar wrote: I was just looking at this page, and it makes me curious: what gives anyone the right to take someone else's mailing list post and include that in a Wiki? Thinks there were posted to public mailing lists are freely copied and distributed. It's a scary thought; I may have posted things in 10 or 12 years ago that might cause me problems today, but I was pretty aware that I was posting to the whole world. There's a difference between public archiving and copying. It's not that simple. Dealing with international contributors it's even worse. Under US law (the only one I'm familiar with), the author of a mailing list post or any other written work _automatically holds copyright_ to that post (although not to the ideas contained therein, but to that particular description of the ideas). (Of course, if the ideas are original to the author, it's good form to acknowledge that regardless of whether the exact words are used). I believe this is true for all countries that are signatory to the Berne convention (which is pretty much all countries [1]). The US in fact was one of the later ones to get into it, before which you had to explicitly copyright things if you wanted copyright. -Deepayan [1] http://upload.wikimedia.org/wikipedia/commons/6/6c/Berne_Convention.png __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wikibooks
I think sometime ago someone suggested that we append a comments/discussion/wiki section to the end of every R functions' help page that is editable by everyday users. In other words, every R function help page has a fixed component that has met R-core's approval and a clearly marked and more flexible components by everyday users. The comments section on every function could contain suggestions, warnings (e.g. the use of c versus as.vector thread that was discussed today), examples, do's and don'ts, suggestion for clarification in documents. I think starting from function-level is an interesting idea to complement Paul Johnson's R tips. This comments could perhaps be cleaned up and integrated for future releases if the R-core agrees on its usefulness. Think of as a Bayesian approach for maintaining information. Regards, Adai Frank E Harrell Jr wrote: Ben Bolker wrote: Alberto Monteiro albmont at centroin.com.br writes: As a big fan of Wikipedia, it's frustrating to see how little there is about R in the correlated project, the Wikibooks: http://en.wikibooks.org/wiki/R_Programming Alberto Monteiro Well, we do have an R wiki -- http://wiki.r-project.org/rwiki/doku.php -- although it is not as active as I'd like. (We got stuck halfway through porting Paul Johnson's R Tips to it ...) Please contribute! Most of the (considerable) effort people expend in answering questions about R goes to the mailing lists -- I personally would like it if some tiny fraction of that energy could be redirected toward the wiki, where information can be presented in a nicer format and (ideally) polished over time -- rather than having to dig back through multiple threads on the mailing lists to get answers. (After that we have to get people to look for the answers on the wiki.) I would like to strongly second Ben. In some ways, R experts are too nice. Continuing to answer the same questions over and over does not lead to a better way using R wiki. I would rather see the work go into enhancing the wiki and refactoring information, and responses to many r-help please for help be see wiki topic x. While doing this let's consider putting a little more burden on new users to look for good answers already provided. Frank Just my two cents -- and I've been delinquent in my wiki'ing recently too ... Ben Bolker __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vector indexing question
Sounds like you have two different tables and are trying to mine one based on the other. Try ref - data.frame( levels = 1:25, ratings = rep(letters[1:5], times=5) ) db - data.frame( vals=101:175, levels=c(1:25, 1:25, 1:25) ) levels.of.interest - ref$levels[ ref$rating==a ] db$vals[ which(db$levels %in% levels.of.interest) ] [1] 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 OR a much more intuitive way is to merge both tables and proceeding as out - merge( db, ref, by=levels, all.x=TRUE ) out - out[ order(out$val), ] # little cleanup subset( out, ratings==a ) # ignore the rownames levels vals ratings 1 1 101 a 16 6 106 a 31 11 111 a 46 16 116 a 61 21 121 a 3 1 126 a 17 6 131 a 32 11 136 a 47 16 141 a 62 21 146 a 2 1 151 a 18 6 156 a 33 11 161 a 48 16 166 a 63 21 171 a Then you can do cool things using the apply() family like tapply( out$vals, out$ratings, mean ) a b c d e 136 137 138 139 140 Check out %in%, merge and apply. Regards, Adai Paul Lynch wrote: Suppose you have 4 related vectors: a.id-c(1:25, 1:25, 1:25) a.vals - c(101:175)# same length as a.id (the values for those IDs) a.id.levels - c(1:25) a.id.ratings - rep(letters[1:5], times=5)# same length as a.id.levels What I would like to do is specify a rating from a.ratings (e.g. e), get the vector of corresponding IDs from a.id.levels (via a.id.levels[a.id.ratings=='e']) and then somehow use those IDs in a.id to get the corresponding values from a.vals. I think I can probably write a loop to construct of a vector of ratings of the same length as a.id so that the ratings match the ID, and then go from there. Is there a better way? Perhaps using factors or levels or something? Thanks, --Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help for looping
Please try to give a simple reproducible example and simplify your codes a bit if you want to get useful responses. For example, you say your data is a matrix of 1000*30, where I presume the matrix has 1000 rows and 30 columns. If so EMP - data[,378:392] does not make sense. Perhaps you might be interested in knn() in the class package. Regards, Adai [EMAIL PROTECTED] wrote: Rusers: I have tried to minimize computing times by taking advanage of lapply(). My data is a 1000*30 matrix and the distance matrix was created with dist(). What I am trying to do is to compute the standard distances using the frequencies attached to the nearest negibors of n reference zones. So I will have 1000 standard distances, and would like to see the frequency distribution of the standard distances. # Convert decimal degrees into UTM miles x-(data[,1]-58277.194363)*0.000621 y-(data[,2]-4414486.03135)*0.000621 # Combine x y for computing distances coords-cbind(x,y) pts-length(data) # Subset housing data and employment data RES-data[,3:17] EMP-data[,378:392] # Combine all the subdata as D D-cbind(coords,RES,EMP) cases-ncol(D)-ncol(coords) # Create a threshold bandwidth for defining the nearest neighbors thrs-seq(0,35,by=1) SDTAZ-rep(list(matrix(,nrow(D),length(thrs))),cases) for (j in 1:nrow(D)) for (k in 1:length(thrs)) for (l in 1:cases) { { { SDTAZ[[l]][j,k]- sqrt( sum( (D[as.vector(which(dis[j,]=thrs[k])),l+2]-D[j,l+2]- min(D[as.vector(which(dis[j,]=thrs[k])),l+2]-D[j,l+2])+1)* ( (dis[j,as.vector(which(dis[j,]=thrs[k]))])^2 ) ) /sum(D[as.vector(which(dis[j,]=thrs[k])),l+2]-D[j,l+2]- min(D[as.vector(which(dis[j,]=thrs[k])),l+2]-D[j,l+2])+1) ) } } } I think that within this nested loop, I should use lapply() but I ended up getting different values I appreciate if someone could kindly help me. Thank you very much. Takatsugu Kobayashi PhD Candidate Indiana University, Dept. Geography __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Listing function displayed as a table
Something ugly like this? Lst - list() Lst[[1]] - list(name=Fred, wife=Mary, no.children=3, child.ages=c(4,7,9)) Lst[[2]] - list(name=Barney, wife=Liz, no.children=2, child.ages=c(3,5)) cbind( do.call(rbind, as.list(Lst))[ ,-4], child.ages=sapply( Lst, function(myli) paste(myli$child.ages, collapse=,) )) Why don't you just save the data in a dataframe instead of a list to begin with ? The only variable I can see that has multiple values is child.ages. Or create one row per record as in most databases. The choice depends on your input. df - rbind( c(Fred, Mary, 4), c(Fred, Mary, 7), c(Fred, Mary, 9), c(Barney, Liz, 3), c(Barney, Liz, 5) ) df - data.frame(df) colnames(df) - c(Father, Mother, Child.Age) df$Child.Age - as.numeric(as.character(df$Child.Age)) parents - paste( df$Father, df$Mother, sep=+ ) getstats - function(x) c( values=paste(x, collapse=,), mean=round(mean(x),2), youngest=min(x), oldest=max(x) ) do.call( rbind, tapply( df$Child.Age, parents, getstats ) ) values mean youngest oldest Barney+Liz 3,5 43 5 Fred+Mary 4,7,9 6.67 4 9 Regards, Adai Schmitt, Corinna wrote: Hallo, good idea it is working. A new question appears: How can I display the entries in a table like name wife no.children child.ages FredMary3 4,7,9 Barney Liz 2 3,5 Thanks, Corinna -Ursprüngliche Nachricht- Von: Michael T. Mader [mailto:[EMAIL PROTECTED] Gesendet: Montag, 26. März 2007 15:32 An: Schmitt, Corinna; r-help@stat.math.ethz.ch Betreff: Re: [R] Listing function Lst - list() Lst[[1]] - list(name=Fred, wife=Mary, no.children=3, cild.ages=c(4,7,9)) Lst[[2]] - list(name=Barney, wife=Liz, no.children=2, cild.ages=c(3,5)) I.e. a list of lists Regards Michael Schmitt, Corinna wrote: Hallo, I build a list by the following way: Lst = list(name=Fred, wife=Mary, no.children=3, cild.ages=c(4,7,9)) I know how I can extract the information one by one. But now I want to add a new entry which looks like name=Barney, wife=Liz, no.children=2, cild.ages=c(3,5) How can I add this information to Lst without overwriting the first entry? How can I then extract the corresponding information if I have both entries in Lst? Thanks for helping, Corinna __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Select the last two rows by id group
Here is yet another solution. This one uses by() which generates nice visual output. score - data.frame( id = c('001','001','001','002','003','003'), math= c(80,75,70,65,65,70), reading = c(65,70,88,NA,90,NA) ) out - by( score, score$id, tail, n=2 ) # score$id: 001 #id math reading # 2 001 75 70 # 3 001 70 88 # # score$id: 002 #id math reading # 4 002 65 NA # # score$id: 003 #id math reading # 5 003 65 90 # 6 003 70 NA And if you want to put it back into a data frame, use do.call( rbind, as.list(out) ) #id math reading # 001.2 001 75 70 # 001.3 001 70 88 # 002 002 65 NA # 003.5 003 65 90 # 003.6 003 70 NA Ignore the rownames here. HTH, Adai Lauri Nikkinen wrote: Hi R-users, Following this post http://tolstoy.newcastle.edu.au/R/help/06/06/28965.html , how do I get last two rows (or six or ten) by id group out of the data frame? Here the example gives just the last row. Sincere thanks, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] run a script during R CMD build
Yes, one way is to use commandArgs in the R script. So say your R script is as follows n - as.character(commandArgs()[3]) fn - as.character(commandArgs()[4]) mat - matrix( rnorm( n*n ), nc=n ) write.table( mat, filenames=fn, sep=\t, quote=FALSE ) Then you execute the commands from command line as R --no-save script 100 out.txt This will run the R commands and output them to out.txt. johan Faux wrote: I would like R CMD build to run some R code which does some stuff and save the result as a file in /inst/docs folder. Is there any way of doing this. Thank you. Johan We won't tell. Get more on shows you hate to love [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] run a script during R CMD build
Sorry, I did not read the question properly. I believe all your functions goes in mypkg/R and your data goes into mypkg/data subdirectory respectively but I am no expert in this area. If you want to reflect your data from one folder to another, you can try using a symbolic or soft link in *nix systems ln -s /inst/mydata.Rdata /somewhere/mypkg/data . Not sure if it the symbolic link approach will work when you try to R CMD BUILD mypkg. You might be interested in the examples in package.skeleton(). Regards, Adai johan Faux wrote: Thanks for your help. Maybe I was not clear in my question. Let say I have a R script , myscript.R which produce some file mydata.Rdata and saves them in /inst folder. My question is where to I put my script so that it will run when I build the package using R CMD build ? I want to include mydata.RData in my package and I want it to be updated every time i build the package. I appreciate your help anyway. -Johan - Original Message From: Adaikalavan Ramasamy [EMAIL PROTECTED] To: johan Faux [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Tuesday, March 20, 2007 12:10:21 PM Subject: Re: [R] run a script during R CMD build Yes, one way is to use commandArgs in the R script. So say your R script is as follows n - as.character(commandArgs()[3]) fn - as.character(commandArgs()[4]) mat - matrix( rnorm( n*n ), nc=n ) write.table( mat, filenames=fn, sep=\t, quote=FALSE ) Then you execute the commands from command line as R --no-save script 100 out.txt This will run the R commands and output them to out.txt. johan Faux wrote: I would like R CMD build to run some R code which does some stuff and save the result as a file in /inst/docs folder. Is there any way of doing this. Thank you. Johan We won't tell. Get more on shows you hate to love [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. http://autos.yahoo.com/new_cars.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] change the name of file
Do you mean write.table instead of Write() ? Try fn - paste(Data_, i, .txt, sep=) write.table( t(x), file=fn, sep=\t ) Regards, Adai On Mon, 2006-07-24 at 11:06 +0200, Robert Mcfadden wrote: Dear R Users, Is it possible to make file names dependent on a changing variable? For instance. I generate random numbers in a loop and at each iteration I want data to write to file (I do not want to write everything in one file using 'append'): for (i in 1:50){ x-matrix(runif(100, min=0,max=1),nrow=5,ncol=20) Write(t(x),file=Data_i.txt,ncolumns=5,sep=\t) } Of course file name Data_i.txt will be the same for changing i, unfortunately. Any suggestion would be appreciate Robert __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] inplace assignment
I do not fully understand your question but how about : inplace - function( df, cond1, cond2, cols, suffix ){ w - which( cond1 cond2 ) df - df[ w, cols ] paste(df, suffix) return(df) } BTW, did you mean colnames(df) - paste(colnames(df), suffix) instead of paste(df, suffix) ? Regards, Adai On Fri, 2006-06-16 at 10:23 +0100, David Hugh-Jones wrote: I get tired of writing, e.g. data.frame[some.condition another.condition, big.list.of.columns] - paste(data.frame[some.condition another.condition, big.list.of.columns], foobar) I would a function like: inplace(paste(data.frame[some.condition another.condition, big.list.of.columns], foobar)) which would take the first argument of the inner function and assign the function's result to it. Has anyone done something like this? Are there simple alternative solutions that I'm missing? Cheers David __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data managment
If your df contains your data, try tmp - cbind( paste(df[ ,1], df[ ,2], sep=:), paste(df[ ,3], df[ ,4], sep=:) ) tmp - t( apply(tmp, 1, sort) ) out - data.frame( do.call(rbind, strsplit( tmp[,1], split=: )), do.call(rbind, strsplit( tmp[,2], split=: )) ) colnames(out) - colnames(df) out Regards, Adai On Wed, 2006-06-14 at 16:35 +0100, yohannes alazar wrote: First I would really like to thank the mailing list for help I got in the past, as a new to R I am really needing some support on hoe to code the following problem. I am trying to sort some data I have in a big file. The file has 4 columns and 19000 rows. An example of it looks like this:- G 0.892 A 0.108 G 0.883 T 0.117 T 0.5 C 0.5 A 0.617 G 0.383 G 0.925 A 0.075 A 0.967 G 0.033 C 0.883 T 0.117 C 0.633 T 0.367 G 0.95 A 0.05 C 0.742 G 0.258 G 0.875 T 0.125 T 0.167 C 0.833 C 0.792 A 0.208 Columns one and three are alphabets while three and four are their corresponding values. I wanted to sort this data so that my first and third columns are in alphabetic order. For example in the first row the order is G then A. This is not in alphabetic order therefore we swap them along with their values and it becomes: A0.108 G 0.892 Row two looks fine but row three needs the same rearrangement as row one. And the final out put looks like: A 0.108 G 0.892 G 0.883 T 0.117 C 0.5 T 0.5 A 0.617 G 0.383 A 0.075 G 0.925 A 0.967 G 0.033 C 0.883 T 0.117 C 0.633 T 0.367 A 0.05 G 0.95 C 0.742 G 0.258 G 0.875 T 0.125 C 0.833 T 0.167 A 0.208 C 0.792 Please some help with the relevant command names or a technique to code this task. Thank you in advance Regards Hannes [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] running R in batch with stdin input
?commandArgs On Thu, 2006-06-15 at 16:05 -0700, Eric Hu wrote: Hi I have a R script that needs to run a few times for different systems. I use R --no-save r.script for one system. I am trying with no luck to use R CMD BATCH to introduce an stdin input variable for the script. I wonder if anyone can provide the correct usage to put the variable in the command like R CMD BATCH r.script name_variable. Thanks. -Eric In the r.script I have name - readline(/dev/stdin) r0 - read.table(/usr/local/surface/$name/$name_c_r) ... I want to get at the end: name - 1BRS r0 - read.table(/usr/local/surface/1BRS/1BRS_c_r) ... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] write data from function into external table
What is your desired output ? This will clarify the problem greatly. Perhaps, this might be of some use : f - function(v, pos, val=100){ v[pos] - val; return(v) } test - 1:3 test - f(test, 1) test [1] 100 2 3 Regards, ADai On Wed, 2006-06-14 at 12:41 +0200, Sebastian Leuzinger wrote: Dear list, My apologies if a solution / explanation to this already exists on the list, but it is difficult to assign it to a certain keyword. test-c(1:3) testfct - function(x) {test[1]-100} test [1] 1 2 3 testfct(1) [1] 1 2 3 Basically, I would like to write data into an external table that the function does not know. Why is this not working / what alternatives exist? Thanks, Sebastian Sebastian Leuzinger University of Basel, Department of Environmental Science Institute of Botany Schönbeinstr. 6 CH-4056 Basel ph0041 (0) 61 2673511 fax 0041 (0) 61 2673504 email [EMAIL PROTECTED] web http://pages.unibas.ch/botschoen/leuzinger __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] merge dataframes with conditions formulated as logical expressions
You have discontinuity between your MIN.VAL and MAX.VAL for a given group. If this is true in practise, then you may want to check and report when VAL is in the discontinuous region. Here is my solution that ignores that (and only uses MIN.VAL and completely disrespecting MAX.VAL). Not very elegant but should do the trick. df - data.frame( GRP=c( A, A, B ), VAL=c( 10, 100, 200 ) ) dp - data.frame( GRP=c( A, A, B, B ), MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), VAL2=c( 1.1, 2.2, 3.3, 4.4 ) ) x - split(df, df$GRP) y - split(dp, dp$GRP) out - NULL for(g in names(x)){ xx - x[[g]] yy - y[[g]] w - cut(xx$VAL, breaks=c(yy$MIN.VAL, Inf), labels=F) tmp - cbind(xx, yy[w, VAL2]) colnames(tmp) - c(GRP, VAL, VAL2) out - rbind(out, tmp) } out Regards, Adai On Wed, 2006-06-14 at 16:55 +0200, Wolfram Fischer wrote: I have a data.frame df containing two variables: GRP: Factor VAL: num I have a data.frame dp containing: GRP: Factor MIN.VAL: num MAX.VAL: num VAL2: num with several rows per GRP where dp[i-1, MAX.VAL] dp[i, MIN.VAL] within the same GRP. I want to create df[i, VAL2] - dpp[z, VAL2] withi along df and dpp - subset( dp, GRP = df[i, GRP] ) so that it is true for each i: df[i, VAL] dpp[z, MIN.VAL] and df[i, VAL] = dpp[z, MAX.VAL] Is there an easy/efficient way to do that? Example: df - data.frame( GRP=c( A, A, B ), VAL=c( 10, 100, 200 ) ) dp - data.frame( GRP=c( A, A, B, B ), MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), VAL2=c( 1.1, 2.2, 3.3, 4.4 ) ) The result should be: df$VAL2 - c( 1.1, 2.2, 4.4 ) Thanks - Wolfram __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] www.r-project.org
I am coming late into the discussion, so apologies if the following points are redundant. 1) IMHO, the most important feature that would make life a lot easier for everyone is having search engines on the main webpage. I know you can click on the Search on the left hand side pane but putting it on the main webpage is much more useful. We can also have a targets section for the search (c.f. http://finzi.psych.upenn.edu/nmz.html) where one can search mailing list, html Manual, FAQ, user-inputted package name etc. 2) About having explicit URL print, may I suggest using http://maps.google.com approach of using the Link to this page (top right hand of the page) ? 3) I understand that R is restricted in terms of priority and human resources. But given that Asia (e.g. India, Singapore, China) has low labour costs and abundant computing personals, would it not make sense for some Asian research group to offer to spearhead and maintain the website ? From a marketing point of view some nice graphics, search functions and navigation etc would be useful to attract newcomers. There could be a simple version alternative (as it is now) for those who prefer or those who have trouble accessing the site. Just my £0.02. Regards, Adai On Tue, 2006-04-25 at 12:33 -0700, Spencer Graves wrote: Hi, Gabor: inline Gabor Grothendieck wrote: On Windows, right click the web page, choose Properties and copy the url there. That works, and I will use it in the future. Thanks. However, if the subject is not educating Spencer Graves but how to make www.r-project.org more user friendly, then it still might help to display as Address the actual web address of the archive page rather than www.r-project.org. It may not look as pretty, but I'm for function first and cosmetics only if they don't interfere with functionality. Best Wishes, spencer graves On 4/25/06, Spencer Graves [EMAIL PROTECTED] wrote: __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] choosing a particular object
Try test.fn - function(obj.name, var.name=q2){ stopifnot( is.character(obj.name) is.character(var.name) ) x - subset( get(obj), select=var.name ) table(x) } On Fri, 2006-03-31 at 12:44 +0300, Adrian DUSA wrote: Hello all, I'd like to create a function which would do some analysis on a particular object, which should be specified in advance. Something like: ls() [1] aa bb cc Object - bb var.name - q2 testfunction - function(obj.name, var.name) { temp - give.me.the.object.called(Object) table(temp[, var.name]) } This should perfom the same thing as: table(bb$q2) Is this possible? TIA, Adrian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ROC optimal threshold
If you define a cost function for a given threshold k as cost(k) = FP(k) + lambda * FN(k) then choose k that minimises cost. FP and FN are false positives and false negatives at threshold k. You change lambda to a value greater than 1 if you want to penalise FN more than FP. There are many situations where this is desirable. For example when you have highly unbalanced class sizes. For example consider a problem where you want to predict rare events and you will be penalised much more heavily if you miss an event than a non-event. I believe the ROC was designed to compare two methods over a range of thresholds and not for choosing the threshold itself. Regards, Adai On Fri, 2006-03-31 at 08:01 -0500, Tim Howard wrote: Jose - I've struggled a bit with the same question, said another way: how do you find the value in a ROC curve that minimizes false positives while maximizing true positives? Here's something I've come up with. I'd be curious to hear from the list whether anyone thinks this code might get stuck in local minima, or if it does find the global minimum each time. (I think it's ok). From your ROC object you need to grab the sensitivity (=true positive rate) and specificity (= 1- false positive rate) and the cutoff levels. Then find the value that minimizes abs(sensitivity-specificity), or sqrt((1-sens)^2)+(1-spec)^2)) as follows: absMin - extract[which.min(abs(extract$sens-extract$spec)),]; sqrtMin - extract[which.min(sqrt((1-extract$sens)^2+(1-extract$spec)^2)),]; In this example, 'extract' is a dataframe containing three columns: extract$sens = sensitivity values, extract$spec = specificity values, extract$votes = cutoff values. The command subsets the dataframe to a single row containing the desired cutoff and the sens and spec values that are associated with it. Most of the time these two answers (abs or sqrt) are the same, sometimes they differ quite a bit. I do not see this application of ROC curves very often. A question for those much more knowledgeable than I is there a problem with using ROC curves in this manner? Tim Howard Date: Fri, 31 Mar 2006 11:58:14 +0200 From: Anadon Herrera, Jose Daniel [EMAIL PROTECTED] Subject: [R] ROC optimal threshold To: 'r-help@stat.math.ethz.ch' r-help@stat.math.ethz.ch Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=iso-8859-1 hello, I am using the ROC package to evaluate predictive models I have successfully plot the ROC curve, however ?is there anyway to obtain the value of operating point=optimal threshold value (i.e. the nearest point of the curve to the top-left corner of the axes)? thank you very much, jose daniel anadon area de ecologia universidad miguel hernandez espa?a __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Plotting a segmented function
Try f - function(x){ if(x = 0) return(0) if( 0 x x = 1 ) return( 0.5*x^2 ) if( 1 x x = 2 ) return( -0.5*x^2 + 2*x - 1 ) return(1) } xx - seq(-1, 3, 0.1) yy - sapply(xx, f) Regards, Adai On Thu, 2006-03-30 at 09:25 -0200, Ken Knoblauch wrote: You could try nested ifelse statements, something like (untested) x - seq(-1, 3, 0.1) y - ifelse( x = 3, ifelse( x = 2, ifelse( x = 1, ifelse( x = 0, 0, x^2/2), 2 * x - (x^2/2) -1), 1) ) plot(x, y) ** This might be a trivial question, but I would appreciate if anybody could suggest an elegant way of plotting a function such as the following (a simple distribution function): F(x) = 0 if x=0 =(x^2)/2 if 0x=1 =2x-((x^2)/2)-1 if 1x=2 =1 if x2 This is just an example. In this case it is a continuous function. But how to do it in general in an elegant way. I've done the following: x1 - seq(-1,0,.01) f1 - rep(0,101) x2 - seq(0,1,.01) f2 - 0.5*(x2^2) x3 - seq(1,2,.01) f3 - (2*x3)-(0.5*(x3^2))-1 x4 - seq(2,3,.01) f4 - rep(1,101) x - c(x1,x2,x3,x4) F - c(f1,f2,f3,f4) plot(x,F,type='l') But this seems very cumbersome. Any help is much appreciated. Thanks Jacob __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] which function to use to do classification
I find it helpful to explain to my colleagues from non-mathematical background that in classification the classes are predefined and in clustering the classes (and sometimes the number of classes) are not. I prefer the use of the term class discovery over clustering when people try to cluster samples in order to derive meaningful classes. Regards, Adai On Wed, 2006-03-29 at 18:52 -0500, Liaw, Andy wrote: In addition to Brian's comment, Gordon's book, already in 2nd edition, is all about clustering, but the title is simply `Classification'. Andy From: Sean Davis We have to be careful here. Classification (which is the terminology that the original poster used) is NOT the same as clustering, although the two are often confused. If the original poster wants to do clustering and examine the results for the presence of three clusters, that is fine and there are many methods for clustering that could be used. However, classification will require a different set of tools. If the clustering tools already pointed out are not doing what is needed (that is, that Cao actually is interested in clustering and not classification), then perhaps a further explanation of what the problem would help clarify. Sean On 3/29/06 1:46 AM, Jacques VESLOT [EMAIL PROTECTED] wrote: try this (suppose mat is your matrix): hc - hclust(dist(mat,manhattan), ward) plot(hc, hang=-1) (x - identify(hc)) # rightclick to stop cutree(hc, 3) km- kmeans(mat, 3) km$cluster km$centers pam(daisy(mat, metric = manhattan), k=3, diss=T)$clust Baoqiang Cao a écrit : Thanks! I tried kmeans, the results is not very positive. Anyway, thanks Jacques! Please let me know if you have any other thoughts! Best regards, Baoqiang Cao === At 2006-03-29, 00:08:44 you wrote: === if you want to classify rows or columns, read: ?hclust ?kmeans library(cluster) ?pam Baoqiang Cao a écrit : Dear All, I have a data, suppose it is an N*M matrix data. All I want is to classify it into, let see, 3 classes. Which method(s) do you think is(are) appropriate for this purpose? Any reference will be welcome! Thanks! Best, Baoqiang Cao --- - __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html . = = = = = = = = = = = = = = = = = = = = Baoqiang Cao [EMAIL PROTECTED] 2006-03-29 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Binning question (binning rows of a data.frame according to a variable)
Are you saying that your data might look like this ? set.seed(1) # For reproducibility only - remove this mydf - data.frame( age=round(runif(100, min=5, max=65), digits=1), nred=rpois(100, lambda=10), nblue=rpois(100, lambda=5), ngreen=rpois(100, lambda=15) ) mydf$total - rowSums( mydf[ , c(nred, nblue, ngreen)] ) head(mydf) age nred nblue ngreen total 1 20.9 11 7 1533 2 27.38 2 1828 3 39.4 11 4 823 4 59.56 5 819 5 17.1 10 3 1629 6 58.9 11 5 1430 If so, then try this : mydf - mydf[order(mydf$age), ] ## re-order by age mydf$cumtotal - cumsum(mydf$total) ## cummulative total brk.pts - seq(from=0, to=sum(mydf$total), len=9) mydf$grp - cut( mydf$cumtotal , brk.pts, labels=F ) age nred nblue ngreen total cumtotal grp 27 5.89 5 822 22 1 47 6.46 5 1324 46 1 92 8.58 4 1830 76 1 10 8.7 12 5 825 101 1 55 9.2 10 7 1330 131 1 69 10.19 3 1830 161 1 So here your 'grp' column is what you really want. Just to check tapply( mydf$total, mydf$grp, sum ) 1 2 3 4 5 6 7 8 352 363 372 387 358 377 377 370 sapply( tapply( mydf$age, mydf$grp, range ), c ) 12345678 [1,] 5.8 17.1 24.5 29.0 34.6 44.6 51.2 56.7 [2,] 16.2 24.0 28.4 33.9 44.1 51.0 55.4 64.5 The last command says that your youngest student in group 1 is aged 5.8 and oldest is aged 16.2. Taking this one step further, you can calculate the proportion of the red, green and blue for each of the 8 groups. props - mydf[ , c(nred, nblue, ngreen)]/mydf$total # proportions apply( props, 2, function(v) tapply( v, mydf$grp, mean ) ) nred nbluengreen 1 0.3459898 0.1776441 0.4763661 2 0.3280712 0.1730796 0.4988492 3 0.3061429 0.1748149 0.5190422 4 0.3759380 0.2084694 0.4155926 5 0.3548805 0.1587353 0.4863842 6 0.3106835 0.1829349 0.5063816 7 0.3525933 0.1599737 0.4874330 8 0.3133796 0.1795567 0.5070637 Hope this of some use. Regards, Adai On Sun, 2006-03-19 at 18:58 +, Dan Bolser wrote: Adaikalavan Ramasamy wrote: Do you by any chance want to sample from each group equally to get an equal representation matrix ? No. I want to make groups of equal sizes, where size isn't simply number of rows (allowing a simple 'gl'), but a sum of the variable. Thanks for the code though, it looks useful. Here is an analogy for what I want to do (in case it helps). A group of students have some bags of marbles - The marbles have different colours. Each student has one bag, but can have between 5 and 50 marbles per bag with any given strange distribution you like. I line the students up by age, and want to see if there is any systematic difference between the number of each color of marble by age (older students may find primary colours less 'cool'). Because the statistics of each individual student are bad (like the proportion of each color per student -- has a high variance) I first put all the students into 8 groups (for example). Thing is, for one reason or another, the number of marbles per bag may systematically vary with age too. However, I am not interested in the number of marbles per bag, so I would like to group the students into 8 groups such that each group has the same total number of marbles. (Each group having a different sized age range, none the less ordered by age). Then I can look at the proportion (or count) of colours in each group, and I can compare the groups or any trend accross the groups. Does that make sense? Cheers, Dan. Here is an example of the input : mydf - data.frame( value=1:100, value2=rnorm(100), grp=rep( LETTERS[1:4], c(35, 15, 30, 20) ) ) which has 35 observations from A, 15 from B, 30 from C and 20 from D. And here is a function that I wrote: sample.by.group - function(df, grp, k, replace=FALSE){ if(length(k)==1){ k - rep(k, length(unique(grp))) } if(!replace any(k table(grp))) stop( paste(Cannot take a sample larger than the population when 'replace = FALSE'.\n, Please specify a value greater than, min(table(grp)), or use 'replace = TRUE'.\n) ) ind - model.matrix( ~ -1 + grp ) w.mat - list(NULL) for(i in 1:ncol(ind)){ w.mat[[i]] - sample( which( ind[,i]==1 ), k[i], replace=replace ) } out - df[ unlist(w.mat), ] return(out) } And here are some examples of how to use it : mydf - mydf[ sample(1:nrow(mydf)), ] # scramble it for fun out1 - sample.by.group(mydf, mydf$grp, k=10 ) table( out1$grp ) out2 - sample.by.group(mydf, mydf$grp, k=50, replace
Re: [R] hist-data without plot
hist(data, plot=FALSE)$counts On Mon, 2006-03-20 at 14:23 +0100, Gottfried Gruber wrote: hello, i need the data from hist() but i do not want the plot. e.g. z=hist(data)$counts #returns absolute frequency but when i execute this command the plot occurs also. is it possible to suppress the plot? many thanks, best regards gg __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Binning question (binning rows of a data.frame according to a variable)
Lets say there are 10 students in the first group and denote x1 as (say) the number of red balls for student 1 and s1 the total balls. Then I was calculating the average the proportion ( x1/s1 + x2/s2 + ... + x10/s10 ) and you were calculating the average number of events (x1+x2 +...+x10)/(s1+s2+...+s10). On second thoughts I think it is much better to calculate the a weighted average of the proportions. The weights should reflect the variance of the estimate of the proportions. ( w1*x1/s1 + w2*x2/s2 + ... + w10*x10/s10 ) On Mon, 2006-03-20 at 15:11 +, Dan Bolser wrote: Adaikalavan Ramasamy wrote: Are you saying that your data might look like this ? set.seed(1) # For reproducibility only - remove this mydf - data.frame( age=round(runif(100, min=5, max=65), digits=1), nred=rpois(100, lambda=10), nblue=rpois(100, lambda=5), ngreen=rpois(100, lambda=15) ) mydf$total - rowSums( mydf[ , c(nred, nblue, ngreen)] ) head(mydf) age nred nblue ngreen total 1 20.9 11 7 1533 2 27.38 2 1828 3 39.4 11 4 823 4 59.56 5 819 5 17.1 10 3 1629 6 58.9 11 5 1430 If so, then try this : mydf - mydf[order(mydf$age), ] ## re-order by age mydf$cumtotal - cumsum(mydf$total) ## cummulative total brk.pts - seq(from=0, to=sum(mydf$total), len=9) mydf$grp - cut( mydf$cumtotal , brk.pts, labels=F ) age nred nblue ngreen total cumtotal grp 27 5.89 5 822 22 1 47 6.46 5 1324 46 1 92 8.58 4 1830 76 1 10 8.7 12 5 825 101 1 55 9.2 10 7 1330 131 1 69 10.19 3 1830 161 1 So here your 'grp' column is what you really want. Just to check tapply( mydf$total, mydf$grp, sum ) 1 2 3 4 5 6 7 8 352 363 372 387 358 377 377 370 sapply( tapply( mydf$age, mydf$grp, range ), c ) 12345678 [1,] 5.8 17.1 24.5 29.0 34.6 44.6 51.2 56.7 [2,] 16.2 24.0 28.4 33.9 44.1 51.0 55.4 64.5 The last command says that your youngest student in group 1 is aged 5.8 and oldest is aged 16.2. Taking this one step further, you can calculate the proportion of the red, green and blue for each of the 8 groups. props - mydf[ , c(nred, nblue, ngreen)]/mydf$total # proportions apply( props, 2, function(v) tapply( v, mydf$grp, mean ) ) nred nbluengreen 1 0.3459898 0.1776441 0.4763661 2 0.3280712 0.1730796 0.4988492 3 0.3061429 0.1748149 0.5190422 4 0.3759380 0.2084694 0.4155926 5 0.3548805 0.1587353 0.4863842 6 0.3106835 0.1829349 0.5063816 7 0.3525933 0.1599737 0.4874330 8 0.3133796 0.1795567 0.5070637 Hope this of some use. Yes, this is very useful! I have just one remaining question, above you take the mean of the group proportion... apply( props, 2, function(v) tapply( v, mydf$grp, mean ) ) instead of explicitly recalculating the proportion for the group (what I couldn't script real good) ... rbind( colSums(mydf[ mydf$grp==1, c(nred, nblue, ngreen)])/ sum (mydf[ mydf$grp==1, c(nred, nblue, ngreen)]), ... colSums(mydf[ mydf$grp==8, c(nred, nblue, ngreen)])/ sum (mydf[ mydf$grp==8, c(nred, nblue, ngreen)]) ) Giving (from the same seed)... nred nbluengreen [1,] 0.3465909 0.1704545 0.4829545 [2,] 0.3250689 0.1735537 0.5013774 [3,] 0.3064516 0.1774194 0.5161290 [4,] 0.3746770 0.2067183 0.4186047 [5,] 0.3519553 0.1564246 0.4916201 [6,] 0.3103448 0.1830239 0.5066313 [7,] 0.3501326 0.1644562 0.4854111 [8,] 0.3081081 0.1837838 0.5081081 Which is *slightly* different from the 'mean' approach. round(former-latter,4) nred nblue ngreen 1 -0.0006 0.0072 -0.0066 2 0.0030 -0.0005 -0.0025 3 -0.0003 -0.0026 0.0029 4 0.0013 0.0018 -0.0030 5 0.0029 0.0023 -0.0052 6 0.0003 -0.0001 -0.0002 7 0.0025 -0.0045 0.0020 8 0.0053 -0.0042 -0.0010 I know this less a question about R, and more a question about general stats, but why did you choose the former and not the latter method? Is one wrong and one right? Or did the former better fit the situation as described? Thanks for any insight into your decision, as this is something that has always puzzled me. Thanks for the beautifully clear examples! Dan. Regards, Adai On Sun, 2006-03-19 at 18:58 +, Dan Bolser wrote: Adaikalavan Ramasamy wrote: Do you by any chance want to sample from each group equally to get an equal representation matrix ? No. I want to make groups of equal sizes, where size isn't simply number of rows (allowing a simple 'gl'), but a sum
Re: [R] Binning question (binning rows of a data.frame according to a variable)
[[ Please ignore the last email which was sent incomplete ]] Lets say there are 10 students in the first group and denote x1 as (say) the number of red balls for student 1 and s1 the total balls. Then I was calculating the average the proportion ( x1/s1 + x2/s2 + ... + x10/s10 ) and you were calculating the average number of events (x1+x2 +...+x10)/(s1+s2+...+s10). It is just by chance that your calculation and mine agrees. When the numbers are highly unbalanced, you may get very different results. On second thoughts I think it is much better to calculate the a weighted average of the proportions. The weights should reflect the variance of the estimate of the proportions. Assuming that your outcome of interest is proportions, the summary effect size might look something like p_hat = ( w1*p1 + w2*p2+ ... + w10*p10 ) where p1 = x1/s1 and w1=1/var(p1). You should be able to obtain the standard errors for this estimate. Using this you can build a confidence interval and see if they overlap with proportion of reds in other groups. There is a big field called meta-analysis that deals with this kind of issue. You might want to read up more about this area. However I am not too familiar with the meta-analysis of proportion Perhaps someone on the mailing list can advise you if this approach is appropriate for your situation and perhaps even some references. Regards, Adai SNIP __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] removing NA from a data frame
You might find the 2nd part of the following response useful https://stat.ethz.ch/pipermail/r-help/2006-March/090611.html And if you want to RTFM, I guess sections 2.5, 2.7, 5.1, 5.2 of http://cran.r-project.org/doc/manuals/R-intro.html might be useful. PS: 1) R-help is designed for and by unpaid volunteers. Therefore sometimes RTFM without page reference is quite acceptable. 2) Similar question often gets repeated over and over the list. It might be useful to search http://finzi.psych.upenn.edu/nmz.html first. On Fri, 2006-03-17 at 16:17 -0500, Sam Steingold wrote: * Francisco J. Zagmutt [EMAIL PROTECTED] [2006-03-17 21:09:48 +]: Go to the help menu- manuals in pdf and select An Introduction to R. After you read that document you will be able to answer your questions :-) I did. I still need help. The matter is not so much with getting things done (I can probably write the code - although I would rather not) as with not reinventing the wheel. PS. next time you decide to answer my question with RTFM, please also include the number of the page that answers my specific question. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Binning question (binning rows of a data.frame according to a variable)
Do you by any chance want to sample from each group equally to get an equal representation matrix ? Here is an example of the input : mydf - data.frame( value=1:100, value2=rnorm(100), grp=rep( LETTERS[1:4], c(35, 15, 30, 20) ) ) which has 35 observations from A, 15 from B, 30 from C and 20 from D. And here is a function that I wrote: sample.by.group - function(df, grp, k, replace=FALSE){ if(length(k)==1){ k - rep(k, length(unique(grp))) } if(!replace any(k table(grp))) stop( paste(Cannot take a sample larger than the population when 'replace = FALSE'.\n, Please specify a value greater than, min(table(grp)), or use 'replace = TRUE'.\n) ) ind - model.matrix( ~ -1 + grp ) w.mat - list(NULL) for(i in 1:ncol(ind)){ w.mat[[i]] - sample( which( ind[,i]==1 ), k[i], replace=replace ) } out - df[ unlist(w.mat), ] return(out) } And here are some examples of how to use it : mydf - mydf[ sample(1:nrow(mydf)), ] # scramble it for fun out1 - sample.by.group(mydf, mydf$grp, k=10 ) table( out1$grp ) out2 - sample.by.group(mydf, mydf$grp, k=50, replace=T) # ie bootstrap table( out2$grp ) and you can even do bootstrapping or sampling with weights via: out3 - sample.by.group(mydf, mydf$grp, k=c(20, 20, 30, 30), replace=T) table( out3$grp ) Regards, Adai On Fri, 2006-03-17 at 16:01 +, Dan Bolser wrote: Hi, I have tuples of data in rows of a data.frame, each column is a variable for the 'items' (one per row). One of the variables is the 'size' of the item (row). I would like to cut my data.frame into groups such that each group has the same *total size*. So, assuming that we order by size, some groups should have several small items while other groups have a few large items. All the groups should have approximately the same total size. I have tried various combinations of cut, quantile, and ecdf, and I just can't work out how to do this! Any help is greatly appreciated! All the best, Dan. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] removing ROWS with missing values
My answers are going to be very similar but with minor cosmetic changes that hopefully will make it bit more clearer. 1) How do you read in the data ? If you are using read.table (or read.csv, read.delim, etc) you can set na.strings=-999 to take advantage of the R's missing value features. 2) First count how many missing values. Then subset to the rows with at least 6 numerical values: number.present - rowSums( myMatrix != -999 ) good.rows - which( number.present = 6 ) myMatrix.sub - myMatrix[ good.rows, ] Note : change the first line to rowSums( !is.na( myMatrix ) ) if you have coded missing values properly as in comment 1). Regards, Adai On Thu, 2006-03-16 at 21:45 +0100, [EMAIL PROTECTED] wrote: Quoting mark salsburg [EMAIL PROTECTED]: I am trying to find out if R can recognize specific criteria for removing rows (i.e. a prexisting function) I have a matrix myMatrix that is 12000 by 20 I would like to remove rows from myMatrix that have: -999 across all columns -999 across all columns but one -999 across all columns but two -999 across all columns but three -999 across all columns but four -999 across all columns but five (-999 here is my missing value) Does R have a function for this, I've explored subset() so far You can create a vector that records the number of missing values in each row n.notmissing - apply(myMatrix != -999, 1, sum) then use row subsetting to remove the ones you don't want myMatrix[n.notmissing == n, ] for n = 0, 1, ... 5, etc. (As an aside, R functions will work better with your data if you use NA instead of a numeric code to represent missing data.) Martyn --- This message and its attachments are strictly confidential. ...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] appending objects to a file created by save()
Another flexible approach is to zip/tar all the required individual .rda files together. There are two advantages that I see : 1) You can extract a single file from the collection if you want. 2) You can easily list what objects are in the zipped/tarred file. In R you have to load all object from a single .rda if you want to extract a single object or even to list what objects are stored. In terms of size, the zipped/tarred file gives comparable if not smaller size than R's function save() with compress=TRUE option. But I have tested this feature in depth. Regards, Adai On Fri, 2006-03-10 at 09:49 +, David Whiting wrote: On Fri, 2006-03-10 at 03:46 -0500, Rajarshi Guha wrote: Hi, I've been slowly transitioning to saving sets of objects for a project using save() rather than cluttering my workspace and then doing save.image() However, sometimes after I have done say: save(x,y,z, file='work.Rda') and I reload it a little later and I see that I also want to save object p. Currently I need to do: save(x,y,z,p, file='work.Rda') Is there any way to instruct save to append an object to a previously created binary data file? I use this approach. One potential problem with this approach is that if you have large saved objects you could get into problems because you need to load them before saving them. ## Function to append an object to an R data file. append.Rda - function(x, file) { old.objects - load(file, new.env()) save(list = c(old.objects, deparse(substitute(x))), file = file) } ## Example: x - 1:10 y - letters[1:10] save(list = c(x, y), file = temp.Rda) z - fred append.Rda(z, temp.Rda) Dave Thanks, --- Rajarshi Guha [EMAIL PROTECTED] http://jijo.cjb.net GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE --- CChheecckk yyoouurr dduupplleexx sswwiittcchh.. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] To improve my understanding of workspaces
I use emacs and ESS to develop the scripts. The new releases of R has the script function already in built. Typically I keep all the data and scripts related to a project in its own folder, so I have minimal worry about paths. To save large and associated objects, I use save(x, y, z, file=lala.rda, compress=TRUE) and then to load x, y, z in another session or workspace I use load(lala.rda) To save small dataframes and matrices, I use write.table(mat, file=lala.txt, sep=\t) and to read it back I use mat - read.delim(file=lala.txt, row.names=1) The problem with .RData (via quit or save.image), is that it keeps all intermediate objects which can be unnecessarily bloated and confusing. Further you will have difficulty distinguishing one .RData from the other by looking at the filename alone. Regards, Adai On Fri, 2006-03-10 at 06:58 -0500, Kevin E. Thorpe wrote: Hello. I have grown accustomed to the .Data directory in S-Plus and so when I came to R I continued that behaviour by saving my workspaces at the end of each R session. So, I have saved workspaces in various directories where I have used R just as I would have had various .Data directories where I had used S-Plus. I have seen comments on the list, most recently from Prof. Ripley that they don't routinely save their workspaces in this way. So my questions are: 1. What do people do instead to manage projects? 2. Is there an official recommendation? From my reading I have learned that you can save data frames (and other objects?) to disk and then attach them. Does this save memory? If I have read correctly, I understand that everything in the workspace is in memory, but haven't been able to determine if objects in the search path are as well. Kind Regards, Kevin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] To improve my understanding of workspaces
A lot of programming style are personal choices and as such varies from individual to individual. See my comments below. On Fri, 2006-03-10 at 09:01 -0500, Kevin E. Thorpe wrote: Thanks Adai. A couple questions/comments about this. Adaikalavan Ramasamy wrote: I use emacs and ESS to develop the scripts. The new releases of R has the script function already in built. I use emacs and ESS too (in Linux). I do not know about the script function you mention. It's not in my version (2.1.1) and I couldn't find it in an RSiteSearch either. I meant to say in newer releases of R _for Windows only_ has script function. Look under File-New scripts (untested). But however it does not appear to have syntax highlighting or auto indenting that emacs has. Typically I keep all the data and scripts related to a project in its own folder, so I have minimal worry about paths. I do the same. To save large and associated objects, I use save(x, y, z, file=lala.rda, compress=TRUE) and then to load x, y, z in another session or workspace I use load(lala.rda) To save small dataframes and matrices, I use write.table(mat, file=lala.txt, sep=\t) and to read it back I use mat - read.delim(file=lala.txt, row.names=1) Am I correct that load() or read.whatever() or even data() will bring the objects into the current workspace while attach() can attach a save() data frame to the search path? Is one approach better than the other in general? I think you are correct. The attach function appears to have two functions now : a) attach(lala.rda) loads objects from lala.rda into the search path b) attach(obj) makes the named columns of a dataframe or list available in the search path. Therefore you only need to type 'aaa' instead of obj$aaa or obj[ , aaa] The second is the more popular form of usage. Personally I would rather not use attach() and prefer to type obj$aaa or use in the context of lm( aaa ~ ., data=obj ). The problem with .RData (via quit or save.image), is that it keeps all intermediate objects which can be unnecessarily bloated and confusing. Further you will have difficulty distinguishing one .RData from the other by looking at the filename alone. If you don't save the workspace on q(), do you also lose the history for that session (although when working in emacs, this is rarely a problem)? I would argue that script file is a better way than history files because I can clean up any test or wrong codes I might have in the script file. However if you prefer to save the history, you can use savehistory(file=history.txt) at any point Regards, Adai SNIP __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] One way ANOVA with NO model
Suppose you have 6 groups (A, B, C, D, E, F) and you measured the weight of 5 individuals from each group. Therefore you have 30 weight observations in total. You wish to test if the mean of the response variable is different for each of the groups. [ i.e. the null hypothesis is that all 6 groups means are the same. ] Lets simulate some data first: grp - gl(6, k=5, labels=LETTERS[1:6]) grp [1] A A A A A B B B B B C C C C C D D D D D E E E E E F F F F F Levels: A B C D E F set.seed(1)# for reproducibility only w - runif(30, min=40, max=75) # weights w - round(w, digits=1) Let us first calculate the group means: tapply(w, grp, mean) A B C D E F 56.24 62.36 55.54 63.54 55.34 53.94 The group means are close, except for possibly group B and D. You can do a formal testing by regressing the response (weight) to its predictors (group). You will need to use the lm() function in R. fit - lm( w ~ grp ) You can get a summary of the fit by summary(fit) ... Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 56.240 4.725 11.903 1.48e-11 *** grpB 6.120 6.682 0.9160.369 grpC -0.700 6.682 -0.1050.917 grpD 7.300 6.682 1.0930.285 grpE -0.900 6.682 -0.1350.894 grpF -2.300 6.682 -0.3440.734 ... This simply says that the intercept is strongly NOT zero. Based on the p-values, one can roughly summarise that none of the groups appear to be different. Another useful tool is the ANOVA test which tests if the between group variations are larger than average within group variation. anova(fit) Analysis of Variance Table Response: w Df Sum Sq Mean Sq F value Pr(F) grp5 411.15 82.23 0.7367 0.6033 Residuals 24 2678.79 111.62 This says that there is no significant variation between the groups. Hope this helps. Regards, Adai On Fri, 2006-03-10 at 11:24 -0500, Jason Horn wrote: I'd like to do a simple one-way ANOVA comparing the means of 6 groups. But it seems like the only way to do an ANOVA in R is to specify some sort of model, where there is an outcome or dependent variable that is a function of independent variables (linear model). But I don't have a linear model, I just want to do a simple ANOVA (and f-test) to compare the means. How do I do this? My stats skills are basic, so please bear with me. Thanks for any ideas... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Ranking within factor subgroups
Thank you! I did not know about the split and unsplit functions. It looks like a very powerful and useful combination to master. Regards, Adai On Thu, 2006-02-23 at 07:28 +0100, Peter Dalgaard wrote: maneesh deshpande [EMAIL PROTECTED] writes: Hi Adai, I think your solution only works if the rows of the data frame are ordered by date and the ordering function is the same used to order the levels of factor(df$date) ? It turns out (as I implied in my question) my data is indeed organized in this manner, so my current problem is solved. In the general case, I suppose, one could always order the data frame by date before proceeding ? Thanks, Maneesh You might prefer to look at split/unsplit/split-, i.e. the z-scores by group line: z - unsplit(lapply(split(x, g), scale), g) with scale suitably replaced. Presumably (meaning: I didn't quite read your code closely enough) z - unsplit(lapply(split(x, g), bucket, 10), g) could do it. From: Adaikalavan Ramasamy [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] To: maneesh deshpande [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] Ranking within factor subgroups Date: Wed, 22 Feb 2006 03:44:45 + It might help to give a simple reproducible example in the future. For example df - cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100), B=rpois(500, 50), C=rpois(500, 30) ) might generate something like date A B C 11 93 51 32 21 95 51 30 31 102 59 28 41 105 52 32 51 105 53 26 61 99 59 37 .... ... .. .. 4955 100 57 19 4965 96 47 44 4975 111 56 35 4985 105 49 23 4995 105 61 30 5005 92 53 32 Here is my proposed solution. Can you double check with your existing functions to see if they are correct. decile.fn - function(x, nbreaks=10){ br - quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T ) br[1] - -Inf return( cut(x, br, labels=F) ) } out - apply( df[ ,c(A, B, C)], 2, function(v) unlist( tapply( v, df$date, decile.fn ) ) ) rownames(out) - rownames(df) out - cbind(df$date, out) Regards, Adai On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote: Hi, I have a dataframe, x of the following form: DateSymbol AB C 20041201 ABC 10 12 15 20041201 DEF 95 4 ... 20050101 ABC 5 3 1 20050101 GHM 12 42 here A, B,C are properties of a set symbols recorded for a given date. I wante to decile the symbols For each date and property and create another set of columns bucketA,bucketB, bucketC containing the decile rank for each symbol. The following non-vectorized code does what I want, bucket - function(data,nBuckets) { q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T) q[1] - q[1] - 0.1 # need to do this to ensure there are no extra NAs cut(data,q,include.lowest=T,labels=F) } calcDeciles - function(x,colNames) { nBuckets - 10 dates - unique(x$Date) for ( date in dates) { iVec - x$Date == date xx - x[iVec,] for (colName in colNames) { data - xx[,colName] bColName - paste(bucket,colName,sep=) x[iVec,bColName] - bucket(data,nBuckets) } } x } x - calcDeciles(x,c(A,B,C)) I was wondering if it is possible to vectorize the above function to make it more efficient. I tried, rlist - tapply(x$A,x$Date,bucket) but I am not sure how to assign the contents of rlist to their appropriate slots in the original dataframe. Thanks, Maneesh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] (Newbie) Aggregate for NA values
I think it makes perfect sense for R to drop it since 'NA' represents uninformative information. I do not know if there is a elegant solution but I would suggest that you make these 'NA' into an informative value. Here is one possibility: df - data.frame( AA=1:10, BB=rep(1:5,2), CC=rep(1:2,5), DD=rnorm(10) ) df[ 9:10, CC ] - NA df[is.na(df)] - lala ## change NA's into informative category ## aggregate( df$DD, by=list( df$CC ), mean ) Group.1 x 1 1 1.1533763 2 2 0.6427338 3lala -0.2745249 aggregate( df$DD, by=list( df$BB, df$CC ), mean ) Group.1 Group.2 x 11 1 0.47264081 22 1 0.63795211 33 1 1.66756015 45 1 1.83535232 51 2 0.89914287 62 2 1.11102134 73 2 0.22268699 84 2 0.33808394 94lala -0.60154608 10 5lala 0.05249622 Regards, Adai On Fri, 2006-02-24 at 10:16 -0500, Vivek Satsangi wrote: Folks, Sorry if this question has been answered before or is obvious (or worse, statistically bad). I don't understand what was said in one of the search results that seems somewhat related. I use aggregate to get a quick summary of the data. Part of what I am looking for in the summary is, how much influence might the NA's have had, if they were included, and is excluding them from the means causing some sort of bias. So I want the summary stat for the NA's also. Here is a simple example session (edited to remove the typos I made, comments added later): tmp_a - 1:10 tmp_b - rep(1:5,2) tmp_c - rep(1:2,5) tmp_d - c(1,1,1,2,2,2,3,3,3,4) tmp_df - data.frame(tmp_a,tmp_b,tmp_c,tmp_d); tmp_df$tmp_c[9:10] - NA ; tmp_df tmp_a tmp_b tmp_c tmp_d 1 1 1 1 1 2 2 2 2 1 3 3 3 1 1 4 4 4 2 2 5 5 5 1 2 6 6 1 2 2 7 7 2 1 3 8 8 3 2 3 9 9 4NA 3 1010 5NA 4 aggregate(tmp_df$tmp_d,by=list(tmp_df$tmp_b,tmp_df$tmp_c),mean); Group.1 Group.2 x 1 1 1 1 2 2 1 3 3 3 1 1 4 5 1 2 5 1 2 2 6 2 2 1 7 3 2 3 8 4 2 2 # Only one row for each (tmp_b, tmp_c) combination, NA's getting dropped. aggregate(tmp_df$tmp_d,by=list(tmp_df$tmp_c),mean); Group.1x 1 1 1.75 2 2 2.00 What I want in this last aggregate is, a mean for the values in tmp_d that correspond to the tmp_c values of NA. Similarly, perhaps there is a way to make the second last call to aggregate return the values of tmp_d for the NA values of tmp_c also. How can I achieve this? -- -- Vivek Satsangi Student, Rochester, NY USA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How do I tell it which directory to use?
I think the idea of defining dir1 and dir2 is a good one. If you want to simplify life even further, you can put these into files that get initialised when R starts. See help(Startup) for details. Regards, Adai On Wed, 2006-02-22 at 16:54 +1100, [EMAIL PROTECTED] wrote: Tom, You can define your working directory by using: setwd(C:\Documents and Settings\Tom\My Documents\qpaper7\R Project Started 19 Dec 05) check that your file is there: list.files() and then use: source(myFile.txt) the machine should load myFile You can go to another directory: setwd(anotherdir) and repeat the procedure. Or even better if you define a number of directories in an external file: dir1 - c(C:\Documents and Settings\Tom\My Documents\qpaper7\) dir2 - c(C:\Documents and Settings\Tom\My Documents\) and after loading the file at the beginning of the sesion you can use: setwd(dir1) etc. Is it of any help to you? Cheers, Augusto Augusto Sanabria. MSc, PhD. Mathematical Modeller Risk Research Group Geospatial Earth Monitoring Division Geoscience Australia (www.ga.gov.au) Cnr. Jerrabomberra Av. Hindmarsh Dr. Symonston ACT 2609 Ph. (02) 6249-9155 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Thomas L Jones Sent: Wednesday, 22 February 2006 4:31 PM To: R-project help Subject: [R] How do I tell it which directory to use? From Tom: In R 2.2.0 under Windows, I want to be able to give it a filename such as myFile.txt without the quotes. But actually I mean: C:\Documents and Settings\Tom\My Documents\qpaper7\R Project Started 19 Dec 05\myFile.txt If I were to repeat this each time, my computer would get all bored and cranky and start to drop bits (only a joke, of course). I think I want to set the Home directory or the working directory or some directory or other to the above directory. I may or may not want to set some environmental variables. R 2.2.0; working directly from the console and copying and pasting code which I want to test into the console. Windows XP Home Edition. Administrator privileges are enabled. A curve ball: There are two accounts, Tom and Jones; the data are stored under Tom, whereas the computation is being done under the Jones account. I won't bore you with the details of why I am doing this. I was able to call Sys.getenv (R_USER) and get the home directory. I am a newbie to R and not familiar with the terminology. Tom Thomas L. Jones, Ph.D., Computer Science __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] elements that appear only once
A slight variation on your solution but hopefully more readable: names( which( table(a) == 1 ) ) Regards, Adai On Wed, 2006-02-22 at 09:11 +, Robin Hankin wrote: Hi. I have a factor and I want to extract just those elements that appear exactly once. How to do this? Toy example follows. a - as.factor(c(rep(oak,5) ,rep(ash,1),rep(elm,1),rep (beech,4))) a [1] oak oak oak oak oak ash elm beech beech beech beech Levels: ash beech elm oak table(a) a ash beech elm oak 1 4 1 5 So I would want ash and elm, because there is only one ash and only one elm in my wood. My Best Effort: names(table(a)[table(a)==1]) [1] ash elm This doesn't seem particularly elegant to me; there must be a better way! anyone? -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] call row names
1) It is not good practice to call your objects after existing R functions (e.g. table) 2) I think you are getting rows and columns confused. If you want to extract the rows/column of a matrix or dataframe, then try subsetting it by mat[A1, ] or mat[ , v4]. See help(subset) for more information. 3) It looks to me that your object is a list. Try doing class(table). Regards, Adai On Tue, 2006-02-21 at 11:56 +, Ana Quitério wrote: Hi R users. I have a table like that: table var A1 A2 A3 v1 41203 3.69 2.31 v2 20577 4.51 8.60 v3 20625 2.87 3.50 v4 6115 8.92 2.97 v5 3160 1.49 2.21 v6 2954 2.62 5.98 v7 4731 1.83 7.53 v8 2435 7.68 3.50 v9 2296 3.03 4.84 v10 6153 1.06 4.28 v11 3157 1.07 1.15 v12 2996 1.06 1.01 v13 6084 2.65 2.63 v14 3115 2.42 5.70 v15 2969 2.92 7.53 * If I want column A1 I do this: table$A1 * And if I want row v4 how can I do? (probably the problem happens because the column var is not considered as row names, but in the reality was with this purpose that was created by me) Thanks in advance Ana Quiterio [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] visualise classification by factors (was Re: R-help Digest, Vol 36, Issue 21)
1) Please use a meaning subject line. Start a new thread instead of replying to another thread. 2) Please give a simple example (if possible reproducible) to help explain the problem. 3) Please read the posting guide. On Tue, 2006-02-21 at 15:12 +0300, Evgeniy Kachalin wrote: Hello, dear R users. I've already sent a question here, but I'm not sure that it had been read. I need to visualize classification of my numerical data based on 2-3 factors. As I suppose, the best way is a tree. With an orbitrary function at the ends (leaves), or at least with means of my data at the ends. What is the way to do it? As I found, ctree offers binary classification, but it that the only way? Of course, tree is not only way, may be you could offer other ways. Thank you. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to Import Data
1) You need to use sep=, which is appropriate for a CSV file. 2) You need to specify the FULL path to the file. See http://cran.r-project.org/bin/windows/base/rw-FAQ.html#R-can_0027t-find-my-file 3) You can use read.csv which is the read.table variant for CSV files. For example a - read.csv( file=c:/Progra~1/Docume~1/ramasamy/x111.csv ) might work if you replace it with your full path. If you have the _unique_ rownames in the first column, you can add the argument row.names=1 in the call. Regards, Adai On Tue, 2006-02-21 at 08:52 -0500, Carl Klarner wrote: Hello, I am a very new user of R. I've spent several hours trying to import data, so I feel okay asking the list for help. I had an Excel file, then I turned it into a csv file, as instructed by directions. My filename is x111.csv. I then used the following commands to read this (fairly small) dataset in. x111 -read.table(file='x111.csv', sep=,header=T, quote=,comment.char=,as.is=T) I then get the following error message. Error in file(file, r) : unable to open connection In addition: Warning message: cannot open file 'x111.csv', reason 'No such file or directory' I would imagine I'm not putting my csv file in the right location for R to be able to read it. If that's the case, where should I put it? Or is there something else I need to do to it first? Thanks for your help, Carl __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Ranking within factor subgroups
It might help to give a simple reproducible example in the future. For example df - cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100), B=rpois(500, 50), C=rpois(500, 30) ) might generate something like date A B C 11 93 51 32 21 95 51 30 31 102 59 28 41 105 52 32 51 105 53 26 61 99 59 37 .... ... .. .. 4955 100 57 19 4965 96 47 44 4975 111 56 35 4985 105 49 23 4995 105 61 30 5005 92 53 32 Here is my proposed solution. Can you double check with your existing functions to see if they are correct. decile.fn - function(x, nbreaks=10){ br - quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T ) br[1] - -Inf return( cut(x, br, labels=F) ) } out - apply( df[ ,c(A, B, C)], 2, function(v) unlist( tapply( v, df$date, decile.fn ) ) ) rownames(out) - rownames(df) out - cbind(df$date, out) Regards, Adai On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote: Hi, I have a dataframe, x of the following form: DateSymbol AB C 20041201 ABC 10 12 15 20041201 DEF 95 4 ... 20050101 ABC 5 3 1 20050101 GHM 12 42 here A, B,C are properties of a set symbols recorded for a given date. I wante to decile the symbols For each date and property and create another set of columns bucketA,bucketB, bucketC containing the decile rank for each symbol. The following non-vectorized code does what I want, bucket - function(data,nBuckets) { q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T) q[1] - q[1] - 0.1 # need to do this to ensure there are no extra NAs cut(data,q,include.lowest=T,labels=F) } calcDeciles - function(x,colNames) { nBuckets - 10 dates - unique(x$Date) for ( date in dates) { iVec - x$Date == date xx - x[iVec,] for (colName in colNames) { data - xx[,colName] bColName - paste(bucket,colName,sep=) x[iVec,bColName] - bucket(data,nBuckets) } } x } x - calcDeciles(x,c(A,B,C)) I was wondering if it is possible to vectorize the above function to make it more efficient. I tried, rlist - tapply(x$A,x$Date,bucket) but I am not sure how to assign the contents of rlist to their appropriate slots in the original dataframe. Thanks, Maneesh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] writing a file using both cat() and paste()
With regards to the saving bit, you might want to try dput() or save() as well. On Thu, 2006-02-09 at 19:29 -0500, Jim Lemon wrote: Taka Matzmoto wrote: Hi R users I like to create a ASCII type file using cat() and paste() x - round(runif(30),3) cat(vector =( , paste(x,sep=), )\n, file = vector.dat,sep=,) when I open vector.dat it was a long ugly file vector =( ,0.463,0.515,0.202,0.232,0.852,0.367,0.432,0.74,0.413,0.022,0.302,0.114,0.583,0.002,0.919,0.066,0.829,0.405,0.363,0.665,0.109,0.38,0.187,0.322,0.582,0.011,0.586,0.112,0.873,0.671, ) Also there was some problems right after opening parenthesis and before the closing parenthesis. Two comma were there I like to to have a nice formatted one like below. That is, 5 random values per a line vector =( 0.463,0.515,0.202,0.232,0.852, 0.367,0.432,0.74,0.413,0.022, 0.302,0.114,0.583,0.002,0.919, 0.066,0.829,0.405,0.363,0.665, 0.109,0.38,0.187,0.322,0.582, 0.011,0.586,0.112,0.873,0.671) First, you might want to avoid using vector, as that is the name of an R function. Say you have a 30 element data vector as above. If you wanted to write a fairly general function to do this, here is a start: vector2file-function(x,file=,values.per.line=5) { if(nchar(file)) sink(file) cat(deparse(substitute(x)),-c(\n) xlen-length(x) for(i in 1:xlen) { cat(x[i]) if(ixlen) cat(,) if(i%%values.per.line == 0) cat(\n) } cat()) if(i%%values.per.line) cat(\n) if(nchar(file))sink() } Jim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Tranferring R results to word prosessors
I agree that this is the best way. I often use Courier font with font size 10 that gives very good results. On Thu, 2006-02-09 at 09:47 -0500, Gabor Grothendieck wrote: In Word use a fixed font such as Courier rather than a proportional font and it will look ok. On 2/9/06, Tom Backer Johnsen [EMAIL PROTECTED] wrote: I have just started looking at R, and are getting more and more irritated at myself for not having done that before. However, one of the things I have not found in the documentation is some way of preparing output from R for convenient formatting into something like MS Word. An example: If you use summary(lm()) you get nice output. However, if you try to paste that output into the word processor, all the text elements are separated by blanks, and that is not optimal for the creation of a table (in the word processing sense). Is there an option to generate tab-separated output in R ? That would solve the problem. Tom ++ | Tom Backer Johnsen, Psychometrics Unit, Faculty of Psychology | | University of Bergen, Christies gt. 12, N-5015 Bergen, NORWAY | | Tel : +47-5558-9185Fax : +47-5558-9879 | | Email : [EMAIL PROTECTED]URL : http://www.galton.uib.no/ | ++ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Tranferring R results to word prosessors
As much as I love LaTeX, I would be cautious on recommending it for someone with a short term objective or does not really need to write equations etc. Part of the reason is the initial step of getting the different softwares required to make LaTeX work properly can be difficult. However, I think this webpage does a good job of explaining it http://www.math.aau.dk/~dethlef/Tips/introduction.html WinEdt (http://www.winedt.com/) might also be worth checking out. Regards, Adai On Thu, 2006-02-09 at 14:11 -0500, Peter Flom wrote: roger bos [EMAIL PROTECTED] 2/9/2006 12:33 pm wrote Yeah, but I don't understand LaTeX at all. Can you point me to a good beginners guide? I like Math into LaTeX, by Gratzer. For a real beginners guide, there's one called first steps in LaTeX. You might also want to look at issues of the PracTEX journal, many of which are for beginners (It's an online journal) Peter Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St http://cduhr.ndri.org www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Plotting 27 line plots in one page
Try par( mfrow=c(9,3) ) for(i in 1:27) plot( lls[[i] ) but I think it might be a little crowded to put 9 rows in a page. Also check out the lattice package which is bit more complicated to learn but gives prettier output. Regards, Adai On Thu, 2006-02-09 at 11:52 -0800, Srinivas Iyyer wrote: Dear group, I am a novice programmer in R. I have a list that has a length of 27 elements. Each element is derived from table function. lls - table(drres) legnth(lls) 27 I want to plot all these elements in 9x3 plot (9 rows and 3 columns) par(9,3) mypltfunc - function(mydata){ + for (i in 1:27){ + plot(unlist(mydata[i])) + } + } mypltfunc(lls) In the graphics window, all 27 figures are drawn in fraction of sec, one by one and I get to see the last graph. It is not drawing into this 9X3 grid. Could any one help me please. Thanks sri __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Plotting 27 line plots in one page
This works : # simulate some data mylist - list(NULL) for(i in 1:27) mylist[[i]] - rnorm( rpois( 1, lambda=20 ) ) # execute par( mfrow=c(9,3) ) par(mar = c(1,1,1,1), oma = c(1,1,1,1)) for(i in 1:27) plot( mylist[[i]] ) Also if you just want to plot the distribution values etc, then you can also try different possibilities such as boxplot( mylist ) Regards, Adai On Thu, 2006-02-09 at 14:05 -0800, Srinivas Iyyer wrote: hi sarah, thanks for your mail. # par(mfrow=c(9,3)) mypltfunc(lls) Error in plot.new() : figure margins too large par(mfcol=c(9, 3)) mypltfunc(lls) Error in plot.new() : figure margins too large ## unfortunately I had this problem before. Thats the reason, I went on using more simply, par(9,3). I tried the following too, although, truely I did not understand the much after doing ?par: mar = c(1,1,1,1) oma = c(1,1,1,1) par(mar,oma) [[1]] NULL [[2]] NULL mypltfunc(lls) By doing this the problem turned out that it printed all 27 figures, one after other in fraction of second, and I see the last figure. given my background (molecular biology) sometimes it is very very difficult to understand the documentation due to terminology problem. thanks sri --- Sarah Goslee [EMAIL PROTECTED] wrote: I want to plot all these elements in 9x3 plot (9 rows and 3 columns) par(9,3) You need to specify what par you want - see ?par for details. In this case, either par(mfrow=c(9,3)) or par(mfcol=c(9, 3)) will do what you want. Sarah -- Sarah Goslee http://www.stringpage.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] (second round) creating a certain type of matrix
I cleaned up your function a bit but please double check generate.matrix - function(nr, runs=5){ h - nr/2## half of nr nc - nr/10 + 1 mat - matrix(0, nr, nc) ## initialize mat[ ,1] - c( rep(1, h), rnorm(h) ) ## 1st column mat[ (h+1):(h+5), 2] - rnorm(5) ## 2nd column if( nc 3 ){ for (i in 3:nc){ ## column 3 - end start - h + 5*(i-2) + 1 end - start + runs - 1 mat[ start:end, i] - rnorm( runs ) } } return(mat) } However you can simplify this greatly. If you ignore the first column (which looks like some initialisation column in simulation process), then you have a matrix with nr/2 rows and nr/10 columns with diagonal blocks 5 runs filled with rnorm values. Here is what I propose : gen.mat - function(x, runs=5){ if( (x %% 2*runs)!=0 ) stop(x, is not a multiple of , 2*runs) nr - x/2 nc - x/(2*runs) mat - matrix(0, nr, nc) for (i in 1:nc) mat[ ((i-1)*runs + 1) : (i*runs), i ] - rnorm(runs) down - cbind( rnorm(nr), mat ) top - cbind( 1, matrix( 0, nr=nr, nc=nc ) ) out - rbind( top, down ) return(out) } # Examples gen.mat(50) gen.mat(55) ## should generate an error gen.mat(24, runs=6) Does this function do what you want to ? Regards, Adai On Tue, 2006-02-07 at 11:03 -0600, Taka Matzmoto wrote: Hi R users Here is what I got with help from Petr Pikal (Thanks Petr Pikal). I modified Petr Pikal's code to a little to meet my purpose. I created a function to generate a matrix generate.matrix-function(n.variable) { mat-matrix(0,n.variable,(n.variable/2)/5+1) #matrix of zeroes dd-dim(mat) # actual dimensions mat[1:(dd[1]/2),1]-1 #put 1 in first half of first column mat[((dd[1]/2)+1):dd[1],1]-rnorm(dd[1]/2,0,1) #put random numbers in following part of the matrix column 1 mat[((dd[1]/2)+1):((dd[1]/2)+5),2]-rnorm(5,0,1) #put random numbers in column2 for (i in 3:(dd[2])) { length.of.rand.numbers - 5 my.rand.num- rnorm(length.of.rand.numbers, 0,1) start - dd[1]/2+5*(i-2)+1 end - start + length.of.rand.numbers-1 mat[((start):end), i]- my.rand.num } mat } Do you (any R users) have any suggestion to this function to make this function work better or efficiently? Taka It works but I From: Petr Pikal [EMAIL PROTECTED] To: Taka Matzmoto [EMAIL PROTECTED],r-help@stat.math.ethz.ch Subject: Re: [R] creating a certain type of matrix Date: Tue, 07 Feb 2006 08:58:59 +0100 MIME-Version: 1.0 Received: from mail.precheza.cz ([80.188.29.243]) by bay0-mc8-f13.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Mon, 6 Feb 2006 23:59:02 -0800 Received: from localhost (localhost [127.0.0.1])by mail.precheza.cz (Mailer) with ESMTP id A636C34E584;Tue, 7 Feb 2006 08:59:00 +0100 (CET) Received: from mail.precheza.cz ([127.0.0.1])by localhost (mail.precheza.cz [127.0.0.1]) (amavisd-new, port 10024)with LMTP id 28608-02-30; Tue, 7 Feb 2006 08:58:59 +0100 (CET) Received: from n1en1.precheza.cz (smtp.precheza.cz [192.168.210.31])by mail.precheza.cz (Mailer) with ESMTP id 35E8634E582;Tue, 7 Feb 2006 08:58:59 +0100 (CET) Received: from pikal ([192.168.210.65]) by n1en1.precheza.cz (Lotus Domino Release 6.5.4FP2) with ESMTP id 2006020708585800-252 ; Tue, 7 Feb 2006 08:58:58 +0100 X-Message-Info: JGTYoYF78jEHjJx36Oi8+Z3TmmkSEdPtfpLB7P/ybN8= X-Confirm-Reading-To: Petr Pikal [EMAIL PROTECTED] X-pmrqc: 1 Return-Receipt-To: Petr Pikal [EMAIL PROTECTED] Priority: normal X-mailer: Pegasus Mail for Windows (4.21c) X-MIMETrack: Itemize by SMTP Server on SRVDomino/PRECHEZA(Release 6.5.4FP2 | September 26, 2005) at 07.02.2006 08:58:58,Serialize by Router on SRVDomino/PRECHEZA(Release 6.5.4FP2 | September 26, 2005) at 07.02.2006 08:58:58,Serialize complete at 07.02.2006 08:58:58 X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at precheza.cz Return-Path: [EMAIL PROTECTED] X-OriginalArrivalTime: 07 Feb 2006 07:59:03.0289 (UTC) FILETIME=[5C87D690:01C62BBC] Hi as only you know perfectly which halves and other portions of your matrices contain zeroes and which contain random numbers you has to finalize the function yourself. Here are few ideas. n-20 mat-matrix(0,n,(n/2)/5+1) #matrix of zeroes dd-dim(mat) # actual dimensions mat[1:(dd[1]/2),1]-1 #put 1 in first half of first column mat[((dd[1]/2)+1):dd[1],1]-rnorm(dd[1]/2,0,1) #put random numbers in following part of the matrix column 1 mat[((dd[1]/2)+1):(dd[1]/2)+dd[1]/4,2]-rnorm(dd[1]/4,0,1) #put random numbers in column2 than according to n and dd values you can put any numbers anywhere in your matrix e.g. in for loop (not.tested :-) for (i in 3:dd[2]) { arrange everything into following desired columns e.g. length.of.rand.numbers - (i-2)*5 my.rand.num-
Re: [R] large lines of data
How does the data look and how are you storing in R (e.g. matrix, list)? I think this an issue related to Word where it is using either unequal spaces or different carriage returns. I would not recommend storing data, especially numerical ones in the form of a matrix, in Word files. I would recommend that you try to copy-and-paste into Excel first and clean it up there. Next save the file as tab delimited and use read.delim() in R. My experience is that that Excel seems understands the oddities of Word better than R does. Regards, Adai On Wed, 2006-02-08 at 11:55 +, Sara Mouro wrote: Dear All, I have to enter many lines of data in the same object. I usually use copy-paste to transfer data from an Word file to R. But, for large lines of data, R gets confused and gives an error message, i.e. it breaks one line somewhere, and lines get no meaning at all. Some times I solve that problem adding enters and making each line shorter, before I do copy-paste. Some times I add spaces in the word document, until R breaks each line (automatically adds a +) in any point where it still correct.. But it stills too subjective for me! :o\ What is the best way to do that? Regards, Sara Mouro [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lme help
Please read the posting 1) I think BioConductor mailing list might be better as some of these could be implemented via LIMMA (I believe) 2) Provide sufficient information and perhaps a simple example. Regards, Adai On Wed, 2006-02-08 at 10:42 +0100, Mahdi Osman wrote: Hi list, I am fitting microarray data (intensity) model using the lme package in R environment. I have 5 fixed variables in the model. One of the fixed variables is genes. I am trying to get p-values for different genes. But I am getting only one p-value for all genes together. I can get a list of p-value when I run lm. Why can't this work in lme? My aim is to do multiple comaprison of all the genes that I have and I can only do this if I have a list of their p-vales I was wondering if you can help me solve this problem. That is getting a list of p-value for each gene in the model using the lme. Thanks in advance for your help Regards Mahdi __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Application of R
No Excel attachment came through. Just taking a guess here but there seems to be very little variation the columns V10 till column V23. BTW, can you not issue the following call : mydata[ , 1:7] ~ mydata[ , 8] + mydata[ ,9] instead of creating y1, y2, ... separately then cbind-ing them ? Regards, Adai On Tue, 2006-02-07 at 21:52 +0800, Andy Wong wrote: I have applied the R and MNP to carry out the data analysis. However, there is an error called SWP : singular matrix. Can someone tell me what is the problem of my formula or the file mydata. I have attached the data file mydata in Excel format and the result printed in pdf format for your information. Thanks for your advice. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] dataframe subset
Sounds like you may need no use match(). On Wed, 2006-02-08 at 15:21 +0100, Bernhard Baumgartner wrote: I have a dataframe with a column, say x consisting of values, each value appearing different times, e.g. x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ... and a vector, including e.g.: y: 2,9,10,... I need a subset of the dataframe: all rows where x is equal to one of the values in y. Currently I use a loop for this, but because x and y are large this is very slow. Is there any idea how to solve this problem faster? Thank you, Bernhard __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] for-loop with multiple variables changing
If you want a one-to-one action between corresponding pairs of a and b, then how about simply : for( i in 1:length(a) ){ print( number[i] ) print( name[i] ) } If you want the first element of a to work with all elements of b, the second element of a to work with all elements of b, ... then you may find functions such as outer, sapply, mapply helpful. Regards, Adai On Mon, 2006-02-06 at 11:53 +0100, Piet van Remortel wrote: Hi all, Never really managed to build a for-loop with multiple running variables in an elegant way. Can anybody hint ? See below for an example of what I would like. EXAMPLE a-c(1,2,3) b-c(name1,name2,name3) for( number in a, name in b ) { print( number ) ##take a value print( name ) ##and have its name available from a second list } Does R support this natively ? thanks ! Piet (Univ. of Antwerp - Belgium) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R is GNU S, not C.... [was how to get or store .....]
On Tue, 2005-12-06 at 13:43 +0100, Martin Maechler wrote: vincent == vincent [EMAIL PROTECTED] on Tue, 06 Dec 2005 11:09:36 +0100 writes: vincent shanmuha boopathy a écrit : a-function(a,b,c,d) { k=a+b l=c+d m=k+l } in this example the function will return only the value of m ...But I like to extract the values of l k also. which command to use for storing or for extracting those intermediate value... vincent may I suggest, inside your function vincent res = c(k, l, m); vincent return(res); please, please, these trailing ; are *so* ugly. This is GNU S, not C (or matlab) ! {and I have another chain of argments why - is so more expressive than = but I'll be happy already if you could drop these ugly empty statements at the end of your lines... vincent # also ... read some intro docs ! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R is GNU S, not C.... [was how to get or store .....]
Yes, it drives me mad too when people use = instead of - for assignment and suppress spaces in an naive attempt for saving space. As an example compare o=fn(x=1,y=10,z=1) with o - fn( x=1, y=10, z=1 ) Regards, Adai On Tue, 2005-12-06 at 13:43 +0100, Martin Maechler wrote: vincent == vincent [EMAIL PROTECTED] on Tue, 06 Dec 2005 11:09:36 +0100 writes: vincent shanmuha boopathy a écrit : a-function(a,b,c,d) { k=a+b l=c+d m=k+l } in this example the function will return only the value of m ...But I like to extract the values of l k also. which command to use for storing or for extracting those intermediate value... vincent may I suggest, inside your function vincent res = c(k, l, m); vincent return(res); please, please, these trailing ; are *so* ugly. This is GNU S, not C (or matlab) ! {and I have another chain of argments why - is so more expressive than = but I'll be happy already if you could drop these ugly empty statements at the end of your lines... vincent # also ... read some intro docs ! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] merging with aggregating
m1 - cbind( n=c(1,2,3,4,6,7,8,9,10,13), v1=c(12,10,3,8,7,12,1,18,1,2), v2=c(0,8,8,4,3,0,0,0,0,0) ) m2 - cbind( n=c(1,2,3,4,5,6,8,10,11,12), v1=c(0,0,1,12,2,2,2,4,7,0), v2=c(2,3,9,8,9,9,0,1,1,1) ) m.all - merge(m1, m2, by=n, all=T) n v1.x v2.x v1.y v2.y 1 1 12002 2 2 10803 3 33819 4 484 128 5 5 NA NA29 6 67329 7 7 120 NA NA 8 81020 9 9 180 NA NA 10 101041 11 11 NA NA71 12 12 NA NA01 13 1320 NA NA Then depending on how many such columns there are, you have a number of ways of aggregating this dataset. One such way is cbind( n=m.all[ , n], v1=rowSums( m.all[ , grep( ^v1, colnames(m.all) ) ], na.rm=T ), v2=rowSums( m.all[ , grep( ^v2, colnames(m.all) )], na.rm=T ) ) n v1 v2 1 1 12 2 2 2 10 11 3 3 4 17 4 4 20 12 5 5 2 9 6 6 9 12 7 7 12 0 8 8 3 0 9 9 18 0 10 10 5 1 11 11 7 1 12 12 0 1 13 13 2 0 Regards, Adai On Tue, 2005-12-06 at 14:22 +0100, Dubravko Dolic wrote: Dear List, I have two data.frame of the following form: A: n V1 V2 1 12 0 2 10 8 3 3 8 4 8 4 6 7 3 7 12 0 8 1 0 9 18 0 10 1 0 13 2 0 B: n V1 V2 1 0 2 2 0 3 3 1 9 4 12 8 5 2 9 6 2 9 8 2 0 10 4 1 11 7 1 12 0 1 Now I want to merge those frame to one data.frame with summing up the columns V1 and V2 but not the column n. So the result in this example would be: AB: n V1 V2 1 12 2 2 10 11 3 4 17 4 20 12 5 2 9 6 9 12 7 12 0 8 3 0 9 18 0 10 5 1 11 7 1 12 0 1 13 2 0 So Columns V1 and V2 are the sum of A und B while n has its old value. Notice that there are different rows in n of A and B. I don't have a clue how to start here. Any hint is welcome. Thanks Dubravko Dolic Munich Germany __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] urgent
1) R-help mailing list is run entirely by volunteers, so requests such as urgent may sound rude 2) Use an informative subject line please ! 3) Please state which package multhist comes from. 4) Please show your call to multhist. 5) multhist does _histograms_ by aggregating points within certain intervals. In your case, you simply want a plot of your raw data. You can use barplot directly via multi.barplot - function( mylist, ... ){ u - unique( unlist( mylist ) ) tb - t(sapply( mylist, function(v) table(factor(v, levels=u)) ) ) barplot( tb, beside=TRUE, ... ) return(tb) } x - c(7, 7 , 8, 9, 15, 17, 18) y - c(7, 8, 9, 15, 17, 19, 20, 20, 25, 23, 22) z - c(8, 9, 9, 9, 31) multi.barplot( list(x, y, z), col=1:3 ) legend( topright, legend=c(one, two, three), fill=1:3 ) Regards, Adai On Tue, 2005-12-06 at 15:32 +0530, Subhabrata wrote: Hello R Users, I have two sets of values x - c(7, 7 , 8, 9, 15, 17, 18) y - c(7, 8, 9, 15, 17, 19, 20, 20, 25, 23, 22) I am able to create multi histogram using multhist(). But not able to control the 'xlim'. ie the xaxis is showing 7.5, 13, 18, 23 1st on what basis it is calculated 2nd I want it to be like 7 8 9 15 17 and so on Can any one help me With Regards Subhabrata Pal [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] saving AIC of intermediate models in step
df - data.frame( matrix( rnorm(1000), nc=10 ) ) colnames(df) - c(y, paste(x, 1:9, sep=)) ifit - glm( y ~ ., data=df ) # initial fit a - stepAIC( ifit, keep=extractAIC ) a$keep [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] 10.000 9. 8. 7. 6. 5. 4. [2,] 319.356 317.3819 315.4327 314.3526 313.2192 312.3311 311.1450 [,8] [,9][,10] [1,] 3. 2. 1. [2,] 310.2517 309.1266 308.1171 On Tue, 2005-11-29 at 19:01 +0100, [EMAIL PROTECTED] wrote: Hi all, I'm fitting GLM's using the step or stepAIC procedures and I would like to save the AIC of the intermediate models. I would appreciate very much information about how todo this. Best wishes Germán López __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] symmetric matrix
Use as.matrix() : m - round( as.dist( cor( matrix( rnorm(600), nc=6 ) ) ), 2 ) m 1 2 3 4 5 2 -0.05 3 0.01 0.03 4 0.00 0.05 0.00 5 0.20 0.07 0.09 -0.07 6 0.03 0.02 0.11 -0.15 -0.11 as.matrix( m ) 1 23 4 5 6 1 0.00 -0.05 0.01 0.00 0.20 0.03 2 -0.05 0.00 0.03 0.05 0.07 0.02 3 0.01 0.03 0.00 0.00 0.09 0.11 4 0.00 0.05 0.00 0.00 -0.07 -0.15 5 0.20 0.07 0.09 -0.07 0.00 -0.11 6 0.03 0.02 0.11 -0.15 -0.11 0.00 On Tue, 2005-11-29 at 03:04 -0800, Robert wrote: I have the following matrix: 1 234 5 2 0.7760856 3 2.016 1.6907899 4 0.6148687 0.2424415 1.593916 5 3.0227028 2.3636083 1.512634 2.426591 6 3.2104434 2.5334957 1.730422 2.608584 0.2184739 the diagonal is 0 and it is a symmetric matrix. Is there any function to return to the normal one? That is, the 6 by 6 one? - Single? There's someone we'd like you to meet. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Legend
And you want to have different colored lines but black texts, try legend(x = 5, y = 0.2, legend = c(Data Set, Fitted PDF), col = c(black, red), lty=1) The advantage of this is that you can use dotted (lty option) or lines with different weights (lwd option). Regards, Adai On Sun, 2005-11-13 at 06:46 -0600, Sundar Dorai-Raj wrote: Mark Miller wrote: I use the following to plot two graphs over each other and then insert a legend, but the two items in the legend both come up the same colour x = seq(0,30,0.01) plot(ecdf(complete), do.point=FALSE, main = 'Cummlative Plot of Monday IATs for Data and\n Fitted PDF over Entire 15 Weeks') lines(x, pexp(x,0.415694806),col=red) legend(x=5,y=0.2 , legend=c(Data Set,Fitted PDF),col=c(black,red)) Many thanks Mark Miller Hi, Mark, You want to use text.col in legend instead of col: set.seed(1) z - rexp(30, 0.415694806) x - seq(0, 30, 0.1) plot(ecdf(z), do.point = FALSE) lines(x, pexp(x, 0.415694806), col=red) legend(x = 5, y = 0.2, legend = c(Data Set, Fitted PDF), text.col = c(black, red)) --sundar __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] selection of missing data
I do not quite follow your post but here are some suggestions. 1) You can the na.strings argument to simplify things df - read.delim(file=lala.txt, na.strings=- ) 2) If you can count the number of metastasis per row first, then find the rows with zero sum. met.cols - c(11,12,14,21,23,24) # metastasis columns number.of.met - rowSums( mela[ , met.cols ] == - ) have.no.met - which( number.of.met == 0 ) mela.no.met - mela[ have.no.met , ] If you had coded your - as NA during read in then, the second line needs to be changed to number.of.met - rowSums( is.na( mela[ , met.cols ] ) ) or simply use complete.cases met.cols - c(11,12,14,21,23,24) # metastasis columns mela.no.met - mela[ which( complete.cases(mela[ , met.cols]) ) , ] 3) If you name your columns in a systematic fashion, then you can easily extract and specify those columns. For example if your columns were named cn - c( age, colon.met, PSA.level, prostate.met, gender, hospitalisation.days, status, liver.met, ethnicity) Then you can extract those names ending with .met as met.cols - grep( \\.met$, cn ) met.cols [1] 2 4 8 Regards, Adai On Sun, 2005-11-13 at 18:40 +0100, [EMAIL PROTECTED] wrote: Hi i'm a french medical student, i have some data that i import from excel. My colomn of the datafram are the localisations of metastasis. If there is a metatsasis there is the symbol _. i want to exclude the row without metastasis wich represent the NA data. so, i wrote this mela is the data fram mela1=ifelse(mela[,c(11:12,14:21,23,24)]==_,1,0) # selection of the colomn of metastasis localisation mela4=subset(mela3,Skin ==0 s.c == 0 Mucosa ==0 Soft.ti ==0 Ln.peri==0 Ln.med==0 Ln.abdo==0 Lung==0 Liver==0 Other.Visc==0 Bone==0 Marrow==0 Brain==0 Other==0) ## selection of the row with no metastasis localisation nrow(mela4) but i dont now if it is possible to make the same thin as ifelse(mela3,Skin s.c== 0, 0,NA) with more than colomn and after to exclude of my data the Na with na.omit. The last question is how can i omit only the row which are NA value for the colomn metastasis c(11:12,14:21,23,24)) Thank you for your help Bertrand billemont [[alternative text/enriched version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] sibling list element reference during list definition
It would be more interesting to ask why does this does not work. mylist - list( value=5, plusplus = mylist$value + 1 ) I think this is because plusplus cannot be evaluated because mylist does not exist and mylist cannot be created until plusplus is evaluated. There are people on this list who can explain in more technical terms. But I think reading this page might help http://cran.r-project.org/doc/manuals/R-lang.html#index-evaluation_002c-symbol-166 Here is one option : mylist - eval( expression( list( value=x, plusplus=x+1) ), list(x=5) ) mylist $value [1] 5 $plusplus [1] 6 Or a bit easier to read is : myfun - function(x) list( value=x, plusplus=x+1 ) mylist - myfun(5) Regards, Adai On Sat, 2005-11-12 at 01:03 -0600, Paul Roebuck wrote: Can the value of a list element be referenced from a sibling list element during list creation without the use of a temporary variable? The following doesn't work but it's the general idea. list(value = 2, plusplus = $value+1) such that the following would be the output from str() List of 2 $ value : num 2 $ plusplus: num 3 -- SIGSIG -- signature too long (core dumped) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to find statistics like that.
If my usage is wrong please correct me. Thank you. Here are my reason : 1. p-value is a (cumulative) probability and always ranges from 0 to 1. A test statistic depending on its definition can wider range of possible values. 2. A test statistics is one that is calculated from the data without the need of assuming a null distribution. Whereas to calculate p-values, you need to assume a null distribution or estimate it empirically using permutation techniques. 3. The directionality of a test statistics may be ignored. For example a t-statistics of -5 and 5 are equally interesting in a two-sided testing. But the smaller the p-value, more evidence against the null hypothesis. Regards, Adai On Thu, 2005-11-10 at 06:05 -0500, Duncan Murdoch wrote: On 11/9/2005 10:01 PM, Adaikalavan Ramasamy wrote: I think an alternative is to use a p-value from F distribution. Even tough it is not a statistics, it is much easier to explain and popular than 1/F. Better yet to report the confidence intervals. Just curious about your usage: why do you say a p-value is not a statistic? Duncan Murdoch Regards, Adai On Wed, 2005-11-09 at 17:09 -0600, Mike Miller wrote: On Wed, 9 Nov 2005, Gao Fay wrote: Hi there, Suppose mu is constant, and error is normally distributed with mean 0 and fixed variance s. I need to find a statistics that: Y_i = mu + beta1* I1_i beta2*I2_i + beta3*I1_i*I2_i + +error, where I_i is 1 Y_i is from group A, and 0 if Y_i is from group B. It is large when beta1=beta2=0 It is small when beta1 and/or beta2 is not equal to 0 How can I find it by R? Thank you very much for your time. That's a funny question. Usually we want a statistic that is small when beta1=beta2=0 and large otherwise. Why not compute the usual F statistic for the null beta1=beta2=0 and then use 1/F as your statistic? Mike __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] paste argument of a function as a file name
my.write - function( obj, name ){ filename - file=paste( name, .txt, sep=) write.table( obj, file=filename, sep=\t, quote=F) } my.write( df, output ) Regards, Adai On Thu, 2005-11-10 at 13:28 +, Luis Ridao Cruz wrote: R-help, I have a function which is exporting the output to a file via write.table(df, file = file name.xls ) What I want is to paste the file name (above) by taking the argument to the function as a file name something like this: MY.function- function(df) { ... ... write.table(df,argument.xls) } MY.function(argument) Thank you __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help regarding mas5 normalization
Please do not post to both BioConductor and R. On Thu, 2005-11-10 at 09:51 -0700, Nayeem Quayum wrote: Hello everybody, I am trying to use mas5 to normalize some array data and using mas5 and mas5calls. But I received these warning message. If anybody can explain the problem I would really appreciate that. Thanks in advance. background correction: mas PM/MM correction : mas expression values: mas background correcting...Warning message: 'loadURL' is deprecated. Use 'load(url())' instead. See help(Deprecated) Warning message: 'loadURL' is deprecated. Use 'load(url())' instead. See help(Deprecated) Warning message: 'loadURL' is deprecated. Use 'load(url())' instead. See help(Deprecated) There were 14 warnings (use warnings() to see them) Note: http://www.bioconductor.org/repository/devel/package/Win32 does not seem to have a valid repository, skipping Note: You did not specify a download type. Using a default value of: Source This will be fine for almost all users Error in FUN(X[[1]], ...) : no slot of name Uses for this object of class localPkg [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how to convert strings back to values?
Problems like these could be caused by improperly spaced columns. Try table(tdf1). If you see only 0 and 1, then you should be fine. However I suspect that you might see things like 0, 0, 1, 1 which means that there is a an extra space between the delimiters. Report back what you get and we can work around a solution if need be. Regards, Adai On Wed, 2005-11-09 at 21:55 +0100, Illyes Eszter wrote: Dear All, It's Eszter from Hungary, a total beginner with R. My problem is the following: I have a dataset with binary values as a comma separated textfile. The samples are in the coloumns and the species are in the rows. I have to transpose it for the further PCoA analysis. There is no problem with reading the dataset. When I transpose the dataset, the original values become strings (instead of 0,1,0,0,1 I have 0,1,0,0,1). The distance matrix cannot be counted from the transposed dataset, I have 2 error messages: Warning in vegdist(tdf1, method = jaccard, binary = FALSE, diag = FALSE, : results may be meaningless because input data have negative entries Error in rowSums(x, prod(dn), p, na.rm) : 'x' must be numeric I do not understand the first, since I have only 1 and 0 in the dataset. I guess I have the second because of the strings instead of values in the dataset. Could you please help me solving these problems? I could not find anything about these in the manuals. Thank you, cheers: Eszter p.s. This is a new problem, last week I worked with a similar dataset and I did not get any error message like these. _ Menő csengőhangok (MP3 is!) és színes képek a mobilodra. Nálunk szinte mindent megtalálsz, KLIKK IDE! www.oplogo.hu __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to find statistics like that.
I think an alternative is to use a p-value from F distribution. Even tough it is not a statistics, it is much easier to explain and popular than 1/F. Better yet to report the confidence intervals. Regards, Adai On Wed, 2005-11-09 at 17:09 -0600, Mike Miller wrote: On Wed, 9 Nov 2005, Gao Fay wrote: Hi there, Suppose mu is constant, and error is normally distributed with mean 0 and fixed variance s. I need to find a statistics that: Y_i = mu + beta1* I1_i beta2*I2_i + beta3*I1_i*I2_i + +error, where I_i is 1 Y_i is from group A, and 0 if Y_i is from group B. It is large when beta1=beta2=0 It is small when beta1 and/or beta2 is not equal to 0 How can I find it by R? Thank you very much for your time. That's a funny question. Usually we want a statistic that is small when beta1=beta2=0 and large otherwise. Why not compute the usual F statistic for the null beta1=beta2=0 and then use 1/F as your statistic? Mike __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] accident modified dataset. How can I recovery it?!
Please do not post thrice, especially within 23 min of the first post. Your problem is that cuckoos is located in DAAG package not the lattice package. I am guessing that at some point you loaded DAAG in the initial session but did not realise this on subsequent sessions. Next time, search http://finzi.psych.upenn.edu/nmz.html first. Regards, Adai On Wed, 2005-11-09 at 22:46 +0100, jia ding wrote: I tried to reinstall the package. but my R version is too old. [EMAIL PROTECTED]:~$ sudo R CMD INSTALL -l /usr/lib/R/library /home/dj/Desktop/lattice_0.12-11.tar.gz Password: ERROR: This R is version 2.1.1 package 'lattice' needs R = 2.2.0 So, *my question being, how do I upgrade from R version *R = 2.2.0 * to R *2.1.1* and keep all of my libraries intact? * On 11/9/05, jia ding [EMAIL PROTECTED] wrote: I first try these command, it works quite well. library(lattice) data(cuckoos) levnam - strsplit(levels(cuckoos$species), \\.) BUT, i want to try : levnam - strsplit(levels(cuckoos$species), .) to see the difference. They maybe I modified the data file, because when I try again, it says: data(cuckoos) Warning message: data set 'cuckoos' not found in: data(cuckoos) would you please tell me how to deal with this problem? I have already tried update.packages() it doesn't help. Thanks. DJ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html