Re: [R] data manipulation help
On Tue, 28 Aug 2007, Zheng Lu wrote: > Dear All: > > I have a dataset like > A=c(0,12,34,5,6,0,4,5,6,0,12,3,4,8,7,0,4,3,5,0,...),I want to add a > column to this dataset, it must be in > B=c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,..), How can I create B > based on the sequence of A. Appreciate. Do you want B <- cumsum( A == 0 ) ?? Please use spaces and newlines to make your code more readable! > > > Zheng > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data manipulation help
below works on you example but someone will have something more elegant. zeroindices<-which(a == 0) rep(1:length(zeroindices),c(diff(zeroindices),(length(a)-zeroindices[len gth(zeroindices)]+1))) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Zheng Lu Sent: Tuesday, August 28, 2007 5:00 PM To: r-help@stat.math.ethz.ch Subject: [R] data manipulation help Dear All: I have a dataset like A=c(0,12,34,5,6,0,4,5,6,0,12,3,4,8,7,0,4,3,5,0,...),I want to add a column to this dataset, it must be in B=c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,..), How can I create B based on the sequence of A. Appreciate. Zheng __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data manipulation help
Dear All: I have a dataset like A=c(0,12,34,5,6,0,4,5,6,0,12,3,4,8,7,0,4,3,5,0,...),I want to add a column to this dataset, it must be in B=c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,..), How can I create B based on the sequence of A. Appreciate. Zheng __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Manipulation using R
...is this what you're looking for? donedat <- subset(data,ID < 6000 | ID >= 7000) findat <- donedat[-unique(rapply(donedat,function(x) which( x < 0 ))),,drop=FALSE] the second line looks through each column, and finds the indices of negative values - rapply() returns all of them as a vector; unique() removes duplicated elements, and with negative indexing you remove these values from donedat. --- Anup Nandialath <[EMAIL PROTECTED]> wrote: > Dear Friends, > > I have data set with around 220,000 rows and 17 columns. One of the columns > is an id variable which is grouped from 1000 through 9000. I need to > perform the following operations. > > 1) Remove all the observations with id's between 6000 and 6999 > > I tried using this method. > > remdat1 <- subset(data, ID<6000) > remdat2 <- subset(data, ID>=7000) > donedat <- rbind(remdat1, remdat2) > > I check the last and first entry and found that it did not have ID values > 6000. Therefore I think that this might be correct, but is this the most > efficient way of doing this? > > 2) I need to remove observations within columns 3, 4, 6 and 8 when they are > negative. For instance if the number in column 3 is -4, then I need to > delete the entire observation. Can somebody help me with this too. > > Thank and Regards > > Anup > > > - > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Manipulation using R
On Apr 17, 2007, at 8:03 PM, Anup Nandialath wrote: > Dear Friends, > > I have data set with around 220,000 rows and 17 columns. One of the > columns is an id variable which is grouped from 1000 through 9000. > I need to perform the following operations. > > 1) Remove all the observations with id's between 6000 and 6999 > > I tried using this method. > > remdat1 <- subset(data, ID<6000) > remdat2 <- subset(data, ID>=7000) > donedat <- rbind(remdat1, remdat2) > > I check the last and first entry and found that it did not have ID > values 6000. Therefore I think that this might be correct, but is > this the most efficient way of doing this? > The rbind is a bit unnecessary probably. I think all you are missing for both questions is the "or" operator, "|". ( ?"|" ) Simply: donedat <- subset(data, ID< 6000 | ID >=7000) would do for this. Not sure about efficiency, but if the code is fast as it stands I wouldn't worry too much about it. > 2) I need to remove observations within columns 3, 4, 6 and 8 when > they are negative. For instance if the number in column 3 is -4, > then I need to delete the entire observation. Can somebody help me > with this too. The following should do it (untested, not sure if it would handle NA's): toremove <- data[,3] < 0 | data[,4] < 0 | data[,6] < 0 | data[,8] < 0 data[!toremove,] If you want more columns than those 4, then we could perhaps look for a better line than the first line above. > Thank and Regards > > Anup Haris Skiadas Department of Mathematics and Computer Science Hanover College __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data Manipulation using R
Dear Friends, I have data set with around 220,000 rows and 17 columns. One of the columns is an id variable which is grouped from 1000 through 9000. I need to perform the following operations. 1) Remove all the observations with id's between 6000 and 6999 I tried using this method. remdat1 <- subset(data, ID<6000) remdat2 <- subset(data, ID>=7000) donedat <- rbind(remdat1, remdat2) I check the last and first entry and found that it did not have ID values 6000. Therefore I think that this might be correct, but is this the most efficient way of doing this? 2) I need to remove observations within columns 3, 4, 6 and 8 when they are negative. For instance if the number in column 3 is -4, then I need to delete the entire observation. Can somebody help me with this too. Thank and Regards Anup - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation in columns (with apply?)
Does this start to do what you want? > x <- "NUM sim N + 1 1 466 + 1 2 450 + 1 3 473 + 1 4 531 + 1 5 515 + 1 6 502 + 1 7 471 + 1 8 460 + 1 9 458 + 1 10 434 + 2 1 289 + 2 2 356 + 2 3 387 + 2 4 440 + 2 5 457 + 2 6 466 + 2 7 467 + 2 8 449 + 2 9 387 + 2 10 394 + 3 1 367 + 3 2 400 + 3 3 476 + 3 4 508 + 3 5 478 + 3 6 501 + 3 7 513 + 3 8 505 + 3 9 492 + 3 10 465" > a <- read.table(textConnection(x), header=T) > lambda <- by(a, a$NUM, function(x) x$N[-1] / x$N[-length(x$N)]) > lambda a$NUM: 1 [1] 0.9656652 1.051 1.1226216 0.9698682 0.9747573 0.9382470 0.9766454 0.9956522 0.9475983 -- a$NUM: 2 [1] 1.2318339 1.0870787 1.1369509 1.0386364 1.0196937 1.0021459 0.9614561 0.8619154 1.0180879 -- a$NUM: 3 [1] 1.0899183 1.190 1.0672269 0.9409449 1.0481172 1.0239521 0.9844055 0.9742574 0.9451220 > # sum of lambdas > sapply(lambda, sum) 123 8.942166 9.357799 9.263944 > # mean > sapply(lambda, mean) 123 0.993574 1.039755 1.029327 > # sd > sapply(lambda, sd) 1 2 3 0.05822850 0.10525335 0.08004527 > > > On 10/10/06, Bret Collier <[EMAIL PROTECTED]> wrote: > R Users, > I have written a small simulation model in R which outputs a datafile > consisting of ending population sizes for each simulation run (year). The > data (see short data example below) is labeled by NUM (simulation run), sim > (year) and N (yearly count). After searching the help files and coming up > empty (probably because I used the wrong terms) I am appealing for some help > for working with the output dataset. > > What I want to do is for each of the i simulation runs (NUM) I want to > > 1) take N(t+1)/N(t)=lambda(t) for each year (where in the below example > t=1,...,10--total years of the simulation) > 2) Sum lambda(t) and divide by t (e.g., output both the mean/se of lambda for > each simulation run) > 3) Take the mean of the mean(lambda's) (and associated stddev, min, max) over > all NUM > > I think I have to write a function for use within an apply statement, but I > am not quite there yet on the learning curve so most of my recent attempts in > R have been useful learning experiences of what not to do... > > Any suggestions/direction is greatly appreciated. > > Bret Collier > TX A&M > > NUM sim N > 1 1 466 > 1 2 450 > 1 3 473 > 1 4 531 > 1 5 515 > 1 6 502 > 1 7 471 > 1 8 460 > 1 9 458 > 1 10 434 > 2 1 289 > 2 2 356 > 2 3 387 > 2 4 440 > 2 5 457 > 2 6 466 > 2 7 467 > 2 8 449 > 2 9 387 > 2 10 394 > 3 1 367 > 3 2 400 > 3 3 476 > 3 4 508 > 3 5 478 > 3 6 501 > 3 7 513 > 3 8 505 > 3 9 492 > 3 10 465 > > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 3.0 > year 2006 > month 04 > day24 > svn rev37909 > language R > version.string Version 2.3.0 (2006-04-24) (yeah, I need to update) > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data manipulation in columns (with apply?)
R Users, I have written a small simulation model in R which outputs a datafile consisting of ending population sizes for each simulation run (year). The data (see short data example below) is labeled by NUM (simulation run), sim (year) and N (yearly count). After searching the help files and coming up empty (probably because I used the wrong terms) I am appealing for some help for working with the output dataset. What I want to do is for each of the i simulation runs (NUM) I want to 1) take N(t+1)/N(t)=lambda(t) for each year (where in the below example t=1,...,10--total years of the simulation) 2) Sum lambda(t) and divide by t (e.g., output both the mean/se of lambda for each simulation run) 3) Take the mean of the mean(lambda's) (and associated stddev, min, max) over all NUM I think I have to write a function for use within an apply statement, but I am not quite there yet on the learning curve so most of my recent attempts in R have been useful learning experiences of what not to do... Any suggestions/direction is greatly appreciated. Bret Collier TX A&M NUM sim N 1 1 466 1 2 450 1 3 473 1 4 531 1 5 515 1 6 502 1 7 471 1 8 460 1 9 458 1 10 434 2 1 289 2 2 356 2 3 387 2 4 440 2 5 457 2 6 466 2 7 467 2 8 449 2 9 387 2 10 394 3 1 367 3 2 400 3 3 476 3 4 508 3 5 478 3 6 501 3 7 513 3 8 505 3 9 492 3 10 465 platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 3.0 year 2006 month 04 day24 svn rev37909 language R version.string Version 2.3.0 (2006-04-24) (yeah, I need to update) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data manipulation docs
Federico Calboli wrote: > Hi All, > > Is there some document/manual about data manipulation within R that I > could use as a reference (obviously, aside the R manuals)? > > The reason I am asking is that I have a number of data frames/matrices > containg genetic data. The data is in a character form, as in: > >V1 V2 V3 V4 V5 > 1 AA AG AA GG AG > 2 AC AA AA GG AG > 3 AA AG AA GG AG > 4 AA AA AA GG AG > 5 AA AA AA GG AA > > I need, to chop, subset, and variously manipulate this kind of data, > sometimes keeping the data in its character format, sometimes converting > it to numeric form (i.e. substitute each data point with the equivalent > factor value). Since the data is ofthe quite big, I have to keep things > memory efficient. > > This whole game is getting excedingly time consuming and frustrating, > because I end up with random pieces of code that I save, patching a > particular problem, but difficult to be 'abstracted' for a new task, so > I get back close to square one annoyingly often. > > Cheers, > > Federico Calboli > > There is a large data manipulation section on the Alzola Harrell document available on CRAN under contributed docs, or a slightly more up to date version at biostat.mc.vanderbilt.edu -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data manipulation docs
On Thursday May 4 2006 10:20, Federico Calboli wrote: > The reason I am asking is that I have a number of data frames/matrices > containg genetic data. The data is in a character form, as in: Take a look at the Bioconductor project: "Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data." http://www.bioconductor.org/ > This whole game is getting excedingly time consuming and frustrating, > because I end up with random pieces of code that I save, patching a > particular problem, but difficult to be 'abstracted' for a new task, so > I get back close to square one annoyingly often. This sounds like a software engineering problem, not an R problem. Does Imperial have a computer science dept.? Maybe they could advise on software engineering techniques. Larry Howe __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] data manipulation docs
Hi All, Is there some document/manual about data manipulation within R that I could use as a reference (obviously, aside the R manuals)? The reason I am asking is that I have a number of data frames/matrices containg genetic data. The data is in a character form, as in: V1 V2 V3 V4 V5 1 AA AG AA GG AG 2 AC AA AA GG AG 3 AA AG AA GG AG 4 AA AA AA GG AG 5 AA AA AA GG AA I need, to chop, subset, and variously manipulate this kind of data, sometimes keeping the data in its character format, sometimes converting it to numeric form (i.e. substitute each data point with the equivalent factor value). Since the data is ofthe quite big, I have to keep things memory efficient. This whole game is getting excedingly time consuming and frustrating, because I end up with random pieces of code that I save, patching a particular problem, but difficult to be 'abstracted' for a new task, so I get back close to square one annoyingly often. Cheers, Federico Calboli -- Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data manipulation
Marc Bernard yahoo.fr> writes: > I would be grateful if you can help me. My problem is the following: > I have a data set like: > > ID time X1 X2 > 11 x111 x211 > 12 x112 x212 > where X1 and X2 are 2 covariates and "time" is the time of observation and ID indicates the cluster. > > I want to merge the above data by creating a new variable "X" and "type" as follows: > > ID timeXtype > 1 1 x111 X1 Try reshape. And have courage, this is one of the more complex interfaces in R, very powerful, but intimidating. Dieter __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data manipulation
I am sure all this work but If you want exaclty the output to be the way you mentioned do this temp<-read.table("yourfile", as.is=T, header=T) temp1<-temp[, 1:3] temp2<-temp[, c(1,2,4)] colnames(temp1)[3]<-"X" colnames(temp2)[3]<-"X" temp3<-merge(temp1, temp2, all=T) temp3$type<-toupper(substr(temp3$X, 1,2)) after which you can generate factors and such.. note the as.is=T in read.table keeps the variables X1, X2, as characters. This is done for substr... P.S. I am sure you can use reshape instead of the second to the fifth commands above ?reshape Jean On Thu, 8 Sep 2005, Sebastian Luque wrote: > Marc Bernard <[EMAIL PROTECTED]> wrote: > > Dear All, > > > I would be grateful if you can help me. My problem is the following: > > I have a data set like: > > > ID time X1 X2 > > 11 x111 x211 > > 12 x112 x212 > > 21 x121 x221 > > 22 x122 x222 > > 23 x123 x223 > > > where X1 and X2 are 2 covariates and "time" is the time of observation and > > ID > > indicates the cluster. > > > I want to merge the above data by creating a new variable "X" and "type" as > > follows: > > > ID timeXtype > > 1 1 x111 X1 > > 1 2 x112 X1 > > 1 1 x211 X2 > > 1 2 x212 X2 > > 2 1 x121 X1 > > 2 2 x122 X1 > > 2 3 x123 X1 > > 2 1 x221 X2 > > 2 2 x222 X2 > > 2 3 x223 X2 > > > > Where "type" is a factor variable indicating if the observation is related > > to > > X1 or X2... > > > Say your original data is in dataframe df, then this might do what you > want: > > R> newdf <- rbind(df[, 1:3], df[, c(1, 2, 4)]) > R> names(newdf)[3] <- "X" > R> newdf$type <- substr(c(df[[3]], df[[4]]), 1, 2) > > Cheers, > > -- > Sebastian P. Luque > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data manipulation
Also see Hadley Wickham's reshape package for more bells & whistles. -- HTH! Jim Porzak Loyalty Matrix Inc. On 9/8/05, Thomas Lumley <[EMAIL PROTECTED]> wrote: > > This is what reshape() does. > > -thomas > > On Thu, 8 Sep 2005, Marc Bernard wrote: > > > Dear All, > > > > I would be grateful if you can help me. My problem is the following: > > I have a data set like: > > > > ID time X1 X2 > > 11 x111 x211 > > 12 x112 x212 > > 21 x121 x221 > > 22 x122 x222 > > 23 x123 x223 > > > > where X1 and X2 are 2 covariates and "time" is the time of observation and > > ID indicates the cluster. > > > > I want to merge the above data by creating a new variable "X" and "type" > > as follows: > > > > ID timeXtype > > 1 1 x111 X1 > > 1 2 x112 X1 > > 1 1 x211 X2 > > 1 2 x212 X2 > > 2 1 x121 X1 > > 2 2 x122 X1 > > 2 3 x123 X1 > > 2 1 x221 X2 > > 2 2 x222 X2 > > 2 3 x223 X2 > > > > > > Where "type" is a factor variable indicating if the observation is related > > to X1 or X2... > > > > Many thanks in advance, > > > > Bernard > > > > > > - > > > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > Thomas Lumley Assoc. Professor, Biostatistics > [EMAIL PROTECTED]University of Washington, Seattle > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data manipulation
This is what reshape() does. -thomas On Thu, 8 Sep 2005, Marc Bernard wrote: > Dear All, > > I would be grateful if you can help me. My problem is the following: > I have a data set like: > > ID time X1 X2 > 11 x111 x211 > 12 x112 x212 > 21 x121 x221 > 22 x122 x222 > 23 x123 x223 > > where X1 and X2 are 2 covariates and "time" is the time of observation and ID > indicates the cluster. > > I want to merge the above data by creating a new variable "X" and "type" as > follows: > > ID timeXtype > 1 1 x111 X1 > 1 2 x112 X1 > 1 1 x211 X2 > 1 2 x212 X2 > 2 1 x121 X1 > 2 2 x122 X1 > 2 3 x123 X1 > 2 1 x221 X2 > 2 2 x222 X2 > 2 3 x223 X2 > > > Where "type" is a factor variable indicating if the observation is related to > X1 or X2... > > Many thanks in advance, > > Bernard > > > - > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data manipulation
Marc Bernard <[EMAIL PROTECTED]> wrote: > Dear All, > I would be grateful if you can help me. My problem is the following: > I have a data set like: > ID time X1 X2 > 11 x111 x211 > 12 x112 x212 > 21 x121 x221 > 22 x122 x222 > 23 x123 x223 > where X1 and X2 are 2 covariates and "time" is the time of observation and ID > indicates the cluster. > I want to merge the above data by creating a new variable "X" and "type" as > follows: > ID timeXtype > 1 1 x111 X1 > 1 2 x112 X1 > 1 1 x211 X2 > 1 2 x212 X2 > 2 1 x121 X1 > 2 2 x122 X1 > 2 3 x123 X1 > 2 1 x221 X2 > 2 2 x222 X2 > 2 3 x223 X2 > Where "type" is a factor variable indicating if the observation is related to > X1 or X2... Say your original data is in dataframe df, then this might do what you want: R> newdf <- rbind(df[, 1:3], df[, c(1, 2, 4)]) R> names(newdf)[3] <- "X" R> newdf$type <- substr(c(df[[3]], df[[4]]), 1, 2) Cheers, -- Sebastian P. Luque __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data manipulation
Hi, This may not be the best solution, but at least it's easy to see what i'm doing, assume that your data set is called "data": # remove the 4th column data1 = data[,-4] # remove the 3rd column data2 = data[,-3] # use cbind to add an extra column with only X1 #elements data1 = cbind(data1, array("X1", nrow(data1), 1) # use cbind to add an extra column with only X2 #elements data2 = cbind(data2, array("X2", nrow(data2), 1) # use rbind to add them together as rows data3 = rbind(data1, data2) # rename the names of the columns colnames(data3) <- c("ID", "time", "X", "type") # show output data3 The only thing I couldn't figure out is how to sort the data set per row, perhaps someone else could help us out on this? Martin --- Marc Bernard <[EMAIL PROTECTED]> wrote: > Dear All, > > I would be grateful if you can help me. My problem > is the following: > I have a data set like: > > ID time X1 X2 > 11 x111 x211 > 12 x112 x212 > 21 x121 x221 > 22 x122 x222 > 23 x123 x223 > > where X1 and X2 are 2 covariates and "time" is the > time of observation and ID indicates the cluster. > > I want to merge the above data by creating a new > variable "X" and "type" as follows: > > ID timeXtype > 1 1 x111 X1 > 1 2 x112 X1 > 1 1 x211 X2 > 1 2 x212 X2 > 2 1 x121 X1 > 2 2 x122 X1 > 2 3 x123 X1 > 2 1 x221 X2 > 2 2 x222 X2 > 2 3 x223 X2 > > > Where "type" is a factor variable indicating if the > observation is related to X1 or X2... > > Many thanks in advance, > > Bernard > > > - > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > __ Click here to donate to the Hurricane Katrina relief effort. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] data manipulation
Dear All, I would be grateful if you can help me. My problem is the following: I have a data set like: ID time X1 X2 11 x111 x211 12 x112 x212 21 x121 x221 22 x122 x222 23 x123 x223 where X1 and X2 are 2 covariates and "time" is the time of observation and ID indicates the cluster. I want to merge the above data by creating a new variable "X" and "type" as follows: ID timeXtype 1 1 x111 X1 1 2 x112 X1 1 1 x211 X2 1 2 x212 X2 2 1 x121 X1 2 2 x122 X1 2 3 x123 X1 2 1 x221 X2 2 2 x222 X2 2 3 x223 X2 Where "type" is a factor variable indicating if the observation is related to X1 or X2... Many thanks in advance, Bernard - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] data manipulation help
Thanks to Patrick Burns, Dieter Menne and Peter Alspach for your help. Peter Alspach indicated me how to get the first and the last capture of every individual with the following code: capture <- matrix(rbinom(40, 1, 0.3), 4, 10) capture [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,]000001101 1 [2,]101000111 0 [3,]000000101 0 [4,]010110000 0 firstcap<-apply(capture, 1, function(x) min((1:length(x))[x==1])) [1] 6 1 7 2 lastcap<-apply(capture, 1, function(x) max((1:length(x))[x==1])) [1] 10 9 9 5 Roberto Hello everybody, I have a dataframe with 468 individuals (rows) that I captured at least once during 28 visits (columns), iso I can know how many times every individual was captured, 0= not capture, 1=capture. persistence<-apply(mortacap2,1,sum) I also want to know when was the first and the last capture for every individual, if I use: which(mortacap2[1,]==1) X18.10.2004 X26.10.2004 X28.10.2004 X30.10.2004 1 5 6 7 I can estimate manually row by row, but I dont get how to estimate the first and the last capture, to all individuals in the database at the same time. I tried d<-numeric(368) for (i in 1:368) {d[i]<-which(mortacap2[1:368,]==1} but it didnt work. Any help would be appreciated. Thanks in advance!! Roberto Munguia Steyer Departamento Biologia Evolutiva Instituto de Ecologia, A.C. Xalapa, Veracruz. MEXICO Windows XP R 2.10 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data manipulation help
roberto munguia posgrado.ecologia.edu.mx> writes: > > I have a dataframe with 468 individuals (rows) that I captured at least once > during 28 visits (columns), it looks like: > > mortality[1:10,] > > > 11 0 0 0 1 1 > 1 0 0 0 .. > so I can know how many times every individual was captured, 0= not capture, > 1=capture. > I also want to know when was the first and the last capture for every > individual, This should give you a starter # create play data cap = data.frame(matrix(rbinom(120,1,0.3),nrow=10)) firstthat<-function(x) which(x)[1] # stolen from Thomas Lumley # Make your data logical; not really needed, but easier to understand cap.log = cap==1 apply(cap.log,1,firstthat) # gives first captures Dieter __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] data manipulation help
Hellow everybody, I have a dataframe with 468 individuals (rows) that I captured at least once during 28 visits (columns), it looks like: mortality[1:10,] X18.10.2004 X20.10.2004 X22.10.2004 X24.10.2004 X26.10.2004 X28.10.2004 X30.10.2004 X01.11.2004 X03.11.2004 X07.11.2004 11 0 0 0 1 1 1 0 0 0 21 0 0 0 0 0 0 0 0 0 31 1 1 0 0 0 1 0 0 1 41 0 0 0 0 0 0 0 0 0 51 1 1 1 0 0 1 1 0 0 61 1 1 1 0 0 0 1 0 0 71 0 1 0 1 0 1 1 0 0 81 1 1 0 1 0 1 1 1 1 91 0 0 1 1 0 0 0 1 0 10 1 0 1 0 1 0 0 0 0 0 so I can know how many times every individual was captured, 0= not capture, 1=capture. persistence<-apply(mortacap2,1,sum) I also want to know when was the first and the last capture for every individual, if I use: which(mortacap2[1,]==1) X18.10.2004 X26.10.2004 X28.10.2004 X30.10.2004 1 5 6 7 I can estimate manually row by row, but I dont get how to estimate the first and the last capture, to all individuals in the database at the same time. I tried d<-numeric(368) for (i in 1:368) {d[i]<-which(mortacap2[1:368,]==1} but it didnt work. Any help would be appreciated. Thanks in advance!! Roberto Munguia Steyer Departamento Biologia Evolutiva Instituto de Ecologia, A.C. Xalapa, Veracruz. MEXICO Windows XP R 2.10 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] data manipulation
You just need to try harder in reading the documentation. Try: data <- matrix(scan("file-name"), ncol=29, byrow=TRUE) Andy > From: Yoko Nakajima > > Hello, > > may I ask a further question? > > I have realized that "data <- > matrix(scan("file-name"), ncol=29)" will read the data > differently than I > thought, i.e., (4,1) is the first column, (17,1) is the > second column, and > (1,1) is the third and so on by this code - please see the data below. > Therefore, the data set I have would not be in order if I > used this code. > > It needed to be read as: (4.4) first column, (1,1) the second > column, and > (17, 17) is the third and so on (i.e., from 4 to 0.5611 makes > the first row > and another 4 to 0.5611 makes the second row and so on). So, > > V1 V2 V3 ... V29 > 4117 ... 0.5611 > 4117 ... 0.5611 > > was needed. > > (Now I have , > V1 V2 V3 V29 > 417 1 ... 0.6578 > 11 -5.1536 ... 0.5611) > > > [The data set I have may have around 1000 sets of them (29 > variables times > around 1000 sets of these 29 variables). I only paste here two sets of > them.] > 4 1 17 1 1 > -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 > -0.5081 -0.2227 > 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 > -0.1033 -0.0796 > -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 > > 4 1 17 2 1 > -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 > -0.5081 -0.2227 > 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 > -0.1033 -0.0796 > -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 > > > > I need 29 columns. This is true. But the data was read differently by > "ncol=29". Is there any way I can handle this problem by R? > > I would very appreciate it if you could let me know. My guess > is that I > should probably rearrange the data set by excel etc.. I have used > "data.entry(data)" and found this. I can not analyze this data set. > > Thank you very much, in advance. > Sincerely, > Yoko. > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data manipulation
Hello, may I ask a further question? I have realized that "data <- matrix(scan("file-name"), ncol=29)" will read the data differently than I thought, i.e., (4,1) is the first column, (17,1) is the second column, and (1,1) is the third and so on by this code - please see the data below. Therefore, the data set I have would not be in order if I used this code. It needed to be read as: (4.4) first column, (1,1) the second column, and (17, 17) is the third and so on (i.e., from 4 to 0.5611 makes the first row and another 4 to 0.5611 makes the second row and so on). So, V1 V2 V3 ... V29 4117 ... 0.5611 4117 ... 0.5611 was needed. (Now I have , V1 V2 V3 V29 417 1 ... 0.6578 11 -5.1536 ... 0.5611) [The data set I have may have around 1000 sets of them (29 variables times around 1000 sets of these 29 variables). I only paste here two sets of them.] 4 1 17 1 1 -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 -0.2227 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 -0.0796 -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 4 1 17 2 1 -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 -0.2227 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 -0.0796 -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 I need 29 columns. This is true. But the data was read differently by "ncol=29". Is there any way I can handle this problem by R? I would very appreciate it if you could let me know. My guess is that I should probably rearrange the data set by excel etc.. I have used "data.entry(data)" and found this. I can not analyze this data set. Thank you very much, in advance. Sincerely, Yoko. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data manipulation
On Wed, 2005-04-13 at 20:56 -0400, Yoko Nakajima wrote: > Hello, > my question is about the data handling. > > I have a data set that is lined as: > > 4 1 17 1 1 > -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 > -0.2227 > 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 > -0.0796 > -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 > 4 1 17 2 1 > -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 > -0.2227 > 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 > -0.0796 > -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 > > This means that 29 variables are together as a set. You saw two sets > of them in example. I have about 1000 sets (of 29 variables) in my > data. When I "scan" this data set, the result comes with 7 columns and > it is not possible, so far, to read the table by column wise, and thus > it is not possible to analyze the data. I would like to know whether > there is a way to solve this problem, say, by arranging columns or > increasing the number of columns of data matrix by R. > > Also, I would like to know how you could name each column of the data > so that you could use the individual column separately. You probably change some default setting in scan(). By default it treats 'white space' as field delimiters. Using your data above, which I save in file called 'test.dat': > mat <- matrix(scan("test.dat"), ncol = 29) Read 58 items > dim(mat) [1] 2 29 > colnames(mat) <- paste("Col", 1:29, sep = "") > mat Col1 Col2Col3Col4Col5 Col6Col7Col8Col9 [1,]4 17 1. -0.1668 -0.5062 0.3640 -0.5081 0.8142 -0.0445 [2,]11 -5.1536 -2.3412 0.9621 0.3678 -0.2227 -0.0389 -0.0578 Col10 Col11 Col12 Col13 Col14 Col15 Col16 Col17 Col18 [1,] -0.1175 0.8673 -0.0796 -0.1716 -0.7014 0.5611 1 2 -5.1536 [2,] -0.1232 -0.1033 -0.0341 -0.1801 0.6578 4.17 1 -0.1668 Col19 Col20 Col21 Col22 Col23 Col24 Col25 Col26 [1,] -2.3412 0.9621 0.3678 -0.2227 -0.0389 -0.0578 -0.1232 -0.1033 [2,] -0.5062 0.3640 -0.5081 0.8142 -0.0445 -0.1175 0.8673 -0.0796 Col27 Col28 Col29 [1,] -0.0341 -0.1801 0.6578 [2,] -0.1716 -0.7014 0.5611 In this case, 'mat' is a matrix with 2 rows and 29 columns. You can restructure this differently as per your requirements. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] data manipulation
Dear Yoko, If you're sure that the data are complete, then data <- matrix(scan("file-name"), ncol=29) should do the trick. Then to name the columns of the data matrix, colnames(data) <- c("one", "two", etc.). [Of course, you'd substitute meaningful names.] I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Yoko Nakajima > Sent: Wednesday, April 13, 2005 7:56 PM > To: r-help@stat.math.ethz.ch > Subject: [R] data manipulation > > Hello, > my question is about the data handling. > > I have a data set that is lined as: > > 4 1 17 1 1 > -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 > -0.5081 -0.2227 > 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 > -0.1033 -0.0796 > -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 > 4 1 17 2 1 > -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 > -0.5081 -0.2227 > 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 > -0.1033 -0.0796 > -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 > > This means that 29 variables are together as a set. You saw > two sets of them in example. I have about 1000 sets (of 29 > variables) in my data. When I "scan" this data set, the > result comes with 7 columns and it is not possible, so far, > to read the table by column wise, and thus it is not possible > to analyze the data. I would like to know whether there is a > way to solve this problem, say, by arranging columns or > increasing the number of columns of data matrix by R. > > Also, I would like to know how you could name each column of > the data so that you could use the individual column separately. > > Sincerely. > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] data manipulation
Hello, my question is about the data handling. I have a data set that is lined as: 4 1 17 1 1 -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 -0.2227 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 -0.0796 -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 4 1 17 2 1 -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 -0.2227 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 -0.0796 -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 This means that 29 variables are together as a set. You saw two sets of them in example. I have about 1000 sets (of 29 variables) in my data. When I "scan" this data set, the result comes with 7 columns and it is not possible, so far, to read the table by column wise, and thus it is not possible to analyze the data. I would like to know whether there is a way to solve this problem, say, by arranging columns or increasing the number of columns of data matrix by R. Also, I would like to know how you could name each column of the data so that you could use the individual column separately. Sincerely. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Data manipulation
thanks a lot for the information, reshape did the job > datars <-reshape(data, timevar="TERRCODE", idvar="BID", direction="wide") greetings helli BID TERRCODEANMCODE 200310413120660 22 0 200310413120660 273 0 200310413120660 280 0 200310413120660 467 0 200310413120660 468 0 200310413127001 5 0 200310413127001 50 0 200310413127001 53 13 200310413127001 54 11 200310413127001 72 0 200310413127001 89 0 200310413127001 671 0 200310413225032 1 0 200310413225032 3 0 200310413225032 6 0 200310413225032 51 0 200310413225032 52 21 200310413225032 53 21 200310413225032 54 21 200310413225032 55 13 200310413225032 57 11 200310413225032 72 0 result: BID ANMCODE.1 ANMCODE.2 ANMCODE.3 ANMCODE.4 ANMCODE.5 ANMCODE.6 ANMCODE.7 200310413120660 NA NA NA NA NA NA NA NA NA NA NA NA 200310413127001 NA NA NA NA 0 NA NA NA NA NA NA NA 200310413225032 0 NA 0 NA NA 0 NA NA NA NA NA NA 200310413225033 0 NA 0 NA NA 0 NA NA NA NA NA NA 200310413225072 0 NA 0 NA NA NA NA NA 0 NA NA 0 200310413225073 0 NA 0 NA NA 0 NA NA NA NA 0 NA 200310413225074 0 NA 0 NA NA 0 NA NA NA NA NA 0 ... Eric Lecoutre <[EMAIL PROTECTED]> schrieb am 08.02.05 08:55:46: Hi, Have a look at: ? aggregate ? reshape Eric At 07:39 8/02/2005, you wrote: >Content-Type: text/plain; charset="iso-8859-1" >Received-SPF: none (hypatia: domain of [EMAIL PROTECTED] does not designate >permitted sender hosts) >X-Virus-Scanned: by amavisd-new at stat.math.ethz.ch >Content-Transfer-Encoding: 8bit >X-MIME-Autoconverted: from quoted-printable to 8bit by >hypatia.math.ethz.ch id j186djX0017423 >X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on >hypatia.math.ethz.ch >X-Spam-Level: >X-Spam-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_50 autolearn=no >version=3.0.2 > >Hi R-friends, > >i have large dataset in the following structure: > >BID;TERRCODE;ANMCODE >200310413290002;4;0 >200310413290002;80;0 >200310413290002;2;0 >200310413290002;5;0 >200310413290003;3;0 >200310413290003;1;0 >200310413290003;11;0 >200310413290003;26;0 >200310413290003;141;21 >200310413290003;472;0 >200310413290004;3;0 >200310413290004;1;0 >200310413290004;7;0 >200310413290004;18;0 >200310413290004;51;0 >200310413290004;56;0 >200310413290004;57;0 >200310413290004;76;0 >200310413290004;89;0 >200310413290004;97;0 >200310413290004;98;0 >200310413290004;72;0 >200310413290004;456;0 >200310413290004;141;0 >200310413290004;640;0 >200310413290004;201;0 >200310413290004;764;20 >200310413290005;273;22 >200310413290005;456;0 >200310413290005;22;0 >200310413290005;23;0 >200310413290005;21;21 >200310413290005;141;0 >200310413290005;640;0 >200310413290005;201;0 >200310413290005;43;0 >200310413290005;650;0 >200310413290005;472;0 >200310413290006;456;0 >200310413290006;22;25 >200310413290006;23;25 >200310413290006;21;25 >200310413290006;640;0 >200310413290006;201;0 >200310413290006;43;0 >200310413290006;651;1 >. >. >. > >BID is the code of my sample-area >TERRCODE is the code for landscape characteristic for example: 640 ... sun >exposed, . >ANMCODE ist the value of the TERRCODE: for example 0 means occuring, 1 >means often occuring, .. > >Now my question: is it possible to get a table with the folllowing structure: > > >BID (TERRCODE)4 (TERRCODE)21 .. >200310413290002 (ANMCODE)0 (ANMCODE)0 ... >200310413290003 0 0 .. >200310413290004 0 0 .. >200310413290005 0 21 .. >200310413290006 0 . 25 .. >. >. > >in this example (TERRCODE) and (ANMCODE) is only for explanation and not >necessary for further analysis > > >greetings from the snowy tyrol > >helli > >platform i386-pc-mingw32 >arch i386 >os mingw32 >system i386, mingw32 >status >major 2 >minor 0.0 >year 2004 >month 10 >day 04 >language R > >__ >Verschicken Sie romantische, coole und witzige Bilder per SMS! > >__ >R-help@stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Eric Lecoutre UCL / Institut de Statistique Voie du Roman Pays, 20 1348 Louvain-la-Neuve Belgium tel: (+32)(0)10473050 [EMAIL PROTECTED] http://www.stat.ucl.ac.be/ISpersonnel/lecoutre If the statistics are boring, then you've got the wrong numbers. -Edward Tufte _
Re: [R] Data manipulation
Helmut Kudrnovsky wrote: Content-Type: text/plain; charset="iso-8859-1" Received-SPF: none (hypatia: domain of [EMAIL PROTECTED] does not designate permitted sender hosts) X-Virus-Scanned: by amavisd-new at stat.math.ethz.ch Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by hypatia.math.ethz.ch id j186djX0017423 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on hypatia.math.ethz.ch X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_50 autolearn=no version=3.0.2 Hi R-friends, i have large dataset in the following structure: BID;TERRCODE;ANMCODE 200310413290002;4;0 200310413290002;80;0 200310413290002;2;0 200310413290002;5;0 200310413290003;3;0 200310413290003;1;0 200310413290003;11;0 200310413290003;26;0 200310413290003;141;21 200310413290003;472;0 200310413290004;3;0 200310413290004;1;0 200310413290004;7;0 200310413290004;18;0 200310413290004;51;0 200310413290004;56;0 200310413290004;57;0 200310413290004;76;0 200310413290004;89;0 200310413290004;97;0 200310413290004;98;0 200310413290004;72;0 200310413290004;456;0 200310413290004;141;0 200310413290004;640;0 200310413290004;201;0 200310413290004;764;20 200310413290005;273;22 200310413290005;456;0 200310413290005;22;0 200310413290005;23;0 200310413290005;21;21 200310413290005;141;0 200310413290005;640;0 200310413290005;201;0 200310413290005;43;0 200310413290005;650;0 200310413290005;472;0 200310413290006;456;0 200310413290006;22;25 200310413290006;23;25 200310413290006;21;25 200310413290006;640;0 200310413290006;201;0 200310413290006;43;0 200310413290006;651;1 . . . BID is the code of my sample-area TERRCODE is the code for landscape characteristic for example: 640 ... sun exposed, . ANMCODE ist the value of the TERRCODE: for example 0 means „occuring“, 1 means „often occuring“, .. Now my question: is it possible to get a table with the folllowing structure: BID (TERRCODE)4 (TERRCODE)21 .. 200310413290002 (ANMCODE)0 (ANMCODE)0 ... 200310413290003 0 0 .. 200310413290004 0 0 .. 200310413290005 0 21 .. 200310413290006 0 . 25 .. Perhaps, if you explain us the formula you derive those lines from the data above. At least I don't understand it. Uwe Ligges . in this example (TERRCODE) and (ANMCODE) is only for explanation and not necessary for further analysis greetings from the snowy tyrol helli platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 0.0 year 2004 month 10 day 04 language R __ Verschicken Sie romantische, coole und witzige Bilder per SMS! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Data manipulation
Content-Type: text/plain; charset="iso-8859-1" Received-SPF: none (hypatia: domain of [EMAIL PROTECTED] does not designate permitted sender hosts) X-Virus-Scanned: by amavisd-new at stat.math.ethz.ch Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by hypatia.math.ethz.ch id j186djX0017423 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on hypatia.math.ethz.ch X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_50 autolearn=no version=3.0.2 Hi R-friends, i have large dataset in the following structure: BID;TERRCODE;ANMCODE 200310413290002;4;0 200310413290002;80;0 200310413290002;2;0 200310413290002;5;0 200310413290003;3;0 200310413290003;1;0 200310413290003;11;0 200310413290003;26;0 200310413290003;141;21 200310413290003;472;0 200310413290004;3;0 200310413290004;1;0 200310413290004;7;0 200310413290004;18;0 200310413290004;51;0 200310413290004;56;0 200310413290004;57;0 200310413290004;76;0 200310413290004;89;0 200310413290004;97;0 200310413290004;98;0 200310413290004;72;0 200310413290004;456;0 200310413290004;141;0 200310413290004;640;0 200310413290004;201;0 200310413290004;764;20 200310413290005;273;22 200310413290005;456;0 200310413290005;22;0 200310413290005;23;0 200310413290005;21;21 200310413290005;141;0 200310413290005;640;0 200310413290005;201;0 200310413290005;43;0 200310413290005;650;0 200310413290005;472;0 200310413290006;456;0 200310413290006;22;25 200310413290006;23;25 200310413290006;21;25 200310413290006;640;0 200310413290006;201;0 200310413290006;43;0 200310413290006;651;1 . . . BID is the code of my sample-area TERRCODE is the code for landscape characteristic for example: 640 ... sun exposed, . ANMCODE ist the value of the TERRCODE: for example 0 means occuring, 1 means often occuring, .. Now my question: is it possible to get a table with the folllowing structure: BID (TERRCODE)4 (TERRCODE)21 .. 200310413290002 (ANMCODE)0 (ANMCODE)0 ... 200310413290003 0 0 .. 200310413290004 0 0 .. 200310413290005 0 21 .. 200310413290006 0 . 25 .. . . in this example (TERRCODE) and (ANMCODE) is only for explanation and not necessary for further analysis greetings from the snowy tyrol helli platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 0.0 year 2004 month 10 day 04 language R __ Verschicken Sie romantische, coole und witzige Bilder per SMS! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Data manipulation query
Hi, see ? quantile to obtain deciles of variable X1 see ? cut to divide the range of 'x' into intervals and codes the values in 'x' according to which interval they fall. se ? table to use the cross-classifying factors to build a contingency table of the counts at each combination of factor levels. Best Vito Hi, Not sure if I am making a simple problem complex but still here we go: I have a data frame with four columns say, X1 X2 X3 and X4. I want to break X4 into deciles and for each deciles obtained, I want to see corresponding elements of X1. Ideally, the output should be in a tabular fashion as shown below: Deciles 1 Deciles 2 Deciles 10 X1-1 X1-2 X1-99 X1-5 X1-3 X1-10 Where X1-1...X1-100 are elements of column X1 that categorized as per deciles Any pointers to help get the right structure would be greatly appreciated!! TIA. Manoj = Diventare costruttori di soluzioni Visitate il portale http://www.modugno.it/ e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Data manipulation query
Hi, Not sure if I am making a simple problem complex but still here we go: I have a data frame with four columns say, X1 X2 X3 and X4. I want to break X4 into deciles and for each deciles obtained, I want to see corresponding elements of X1. Ideally, the output should be in a tabular fashion as shown below: Deciles 1 Deciles 2 Deciles 10 X1-1 X1-2 X1-99 X1-5 X1-3 X1-10 Where X1-1...X1-100 are elements of column X1 that categorized as per deciles Any pointers to help get the right structure would be greatly appreciated!! TIA. Manoj __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data manipulation
And here is a simplification I just noticed: date.grouping <- function(d) { # for ea date in d calculate date beginning 6 month period which contains it POSIXct.dates <- as.POSIXct(paste(as.character(d),"01",sep="-")) breaks <- c(seq(from=min(POSIXct.dates), to=max(POSIXct.dates), by="6 mo"), Inf) format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" ) } patients <- read.table("clipboard",header=T) patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) ) patients2 <- as.data.frame( patients2 ) summary(patients2) boxplot(patients2) --- Gabor Grothendieck <[EMAIL PROTECTED]> wrote: >Sorry but there was an error in the seq statement. Here it is again. > > >date.grouping <- function(d) { > # for ea date in d calculate date beginning 6 month period which contains it > mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2) > f <- function(x) do.call( "ISOdate", as.list(x) ) > POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1) > breaks <- c(seq(from=min(POSIXct.dates), to=max(POSIXct.dates), by="6 mo"), Inf) > format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" ) >} > >patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) ) >patients2 <- as.data.frame( patients2 ) > >summary(patients2) > >boxplot(patients2) > > > >--- Gabor Grothendieck <[EMAIL PROTECTED]> wrote: >> >>Try this. The function takes a vector of dates of the form -mm and produces a >>new character vector of dates of the same form except the >>output date is the beginning of the 6 month period in which the input date lies. >>The 6 month intervals are measured from the minimum date. >> >>date.grouping <- function(d) { >> # for ea date in d calculate date beginning 6 month period which contains it >> mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2) >> f <- function(x) do.call( "ISOdate", as.list(x) ) >> POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1) >> breaks <- c(seq(from=min(POSIXct.dates), along=POSIXct.dates, by="6 mo"), Inf) >> format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" ) >>} >> >>patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) ) >>patients2 <- as.data.frame( patients2 ) >> >>summary(patients2) >> >>boxplot(patients2) >> >> >> >>--- Ricardo Pietrobon <[EMAIL PROTECTED]> wrote: >>>Hi, >>> >>> >>>I am new to R, coming from a few years using Stata. I've been twisting my >>>brain and checking several R and S references over the last few days to >>>try to solve this data management problem: I have a data set with a unique >>>patient identifier that is repeated along multiple rows, a variable with >>>month of patient encounter, and a continous variable for cost of >>>individual encounters. The data looks like this: >>> >>>ID datecost >>>1"2001-01" 200.00 >>>1"2001-01" 123.94 >>>1"2001-03" 100.23 >>>1"2001-04" 150.34 >>>2"2001-03" 296.34 >>>2"2002-05" 156.36 >>> >>> >>>I would like to obtain the median costs and boxplots for the sum of >>>encounters happening in the first six months after the index encounter >>>(first patient encounter) for each patient, then the mean and median costs >>>for the costs happening from 6 to 12 months after the index encounter, and >>>so on. Notice that the first ID has two encounters during the index date, >>>making it more difficult to define a single row with the index encounter. >>> >>>Any help would be appreciated, >>> >>> >>>Ricardo >>> >>> >>>Ricardo Pietrobon, MD >>>Assistant Professor of Surgery >>>Duke University Medical Center >>>Durham, NC 27710 US >>> >>>__ >>>[EMAIL PROTECTED] mailing list >>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help >> >>__ >>[EMAIL PROTECTED] mailing list >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >_ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] data manipulation
Sorry but there was an error in the seq statement. Here it is again. date.grouping <- function(d) { # for ea date in d calculate date beginning 6 month period which contains it mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2) f <- function(x) do.call( "ISOdate", as.list(x) ) POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1) breaks <- c(seq(from=min(POSIXct.dates), to=max(POSIXct.dates), by="6 mo"), Inf) format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" ) } patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) ) patients2 <- as.data.frame( patients2 ) summary(patients2) boxplot(patients2) --- Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > >Try this. The function takes a vector of dates of the form -mm and produces a >new character vector of dates of the same form except the >output date is the beginning of the 6 month period in which the input date lies. The >6 month intervals are measured from the minimum date. > >date.grouping <- function(d) { > # for ea date in d calculate date beginning 6 month period which contains it > mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2) > f <- function(x) do.call( "ISOdate", as.list(x) ) > POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1) > breaks <- c(seq(from=min(POSIXct.dates), along=POSIXct.dates, by="6 mo"), Inf) > format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" ) >} > >patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) ) >patients2 <- as.data.frame( patients2 ) > >summary(patients2) > >boxplot(patients2) > > > >--- Ricardo Pietrobon <[EMAIL PROTECTED]> wrote: >>Hi, >> >> >>I am new to R, coming from a few years using Stata. I've been twisting my >>brain and checking several R and S references over the last few days to >>try to solve this data management problem: I have a data set with a unique >>patient identifier that is repeated along multiple rows, a variable with >>month of patient encounter, and a continous variable for cost of >>individual encounters. The data looks like this: >> >>IDdatecost >>1 "2001-01" 200.00 >>1 "2001-01" 123.94 >>1 "2001-03" 100.23 >>1 "2001-04" 150.34 >>2 "2001-03" 296.34 >>2 "2002-05" 156.36 >> >> >>I would like to obtain the median costs and boxplots for the sum of >>encounters happening in the first six months after the index encounter >>(first patient encounter) for each patient, then the mean and median costs >>for the costs happening from 6 to 12 months after the index encounter, and >>so on. Notice that the first ID has two encounters during the index date, >>making it more difficult to define a single row with the index encounter. >> >>Any help would be appreciated, >> >> >>Ricardo >> >> >>Ricardo Pietrobon, MD >>Assistant Professor of Surgery >>Duke University Medical Center >>Durham, NC 27710 US >> >>__ >>[EMAIL PROTECTED] mailing list >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >__ >[EMAIL PROTECTED] mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] data manipulation
Try this. The function takes a vector of dates of the form -mm and produces a new character vector of dates of the same form except the output date is the beginning of the 6 month period in which the input date lies. The 6 month intervals are measured from the minimum date. date.grouping <- function(d) { # for ea date in d calculate date beginning 6 month period which contains it mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2) f <- function(x) do.call( "ISOdate", as.list(x) ) POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1) breaks <- c(seq(from=min(POSIXct.dates), along=POSIXct.dates, by="6 mo"), Inf) format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" ) } patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) ) patients2 <- as.data.frame( patients2 ) summary(patients2) boxplot(patients2) --- Ricardo Pietrobon <[EMAIL PROTECTED]> wrote: >Hi, > > >I am new to R, coming from a few years using Stata. I've been twisting my >brain and checking several R and S references over the last few days to >try to solve this data management problem: I have a data set with a unique >patient identifier that is repeated along multiple rows, a variable with >month of patient encounter, and a continous variable for cost of >individual encounters. The data looks like this: > >ID datecost >1 "2001-01" 200.00 >1 "2001-01" 123.94 >1 "2001-03" 100.23 >1 "2001-04" 150.34 >2 "2001-03" 296.34 >2 "2002-05" 156.36 > > >I would like to obtain the median costs and boxplots for the sum of >encounters happening in the first six months after the index encounter >(first patient encounter) for each patient, then the mean and median costs >for the costs happening from 6 to 12 months after the index encounter, and >so on. Notice that the first ID has two encounters during the index date, >making it more difficult to define a single row with the index encounter. > >Any help would be appreciated, > > >Ricardo > > >Ricardo Pietrobon, MD >Assistant Professor of Surgery >Duke University Medical Center >Durham, NC 27710 US > >__ >[EMAIL PROTECTED] mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] data manipulation
Ricardo Pietrobon <[EMAIL PROTECTED]> writes: > IDdatecost > 1 "2001-01" 200.00 > 1 "2001-01" 123.94 > 1 "2001-03" 100.23 > 1 "2001-04" 150.34 > 2 "2001-03" 296.34 > 2 "2002-05" 156.36 > > > I would like to obtain the median costs and boxplots for the sum of > encounters happening in the first six months after the index encounter > (first patient encounter) for each patient, then the mean and median costs > for the costs happening from 6 to 12 months after the index encounter, and > so on. Notice that the first ID has two encounters during the index date, > making it more difficult to define a single row with the index encounter. > > Any help would be appreciated, Let's see... You're going to need a bit of slight ugliness to convert the date to a numeric month number. Something like (NB: That's a code that means "I didn't actually try this"...) attach(yourdata) monthnum <- sapply(strsplit(date,"-"),function(x)sum(as.numeric(x)*c(12,1))) Then we need a table of the index dates for each person tbl <- tapply(monthnum, ID, min) Now subtract the index date from monthnum months.post.index <- monthnum - tbl[ID] then you probably want to look at the subset of your original data frame and do the sums total.cost.6mo <- with(subset(yourdata,months.post.index < 6), tapply(cost,ID,sum)) and finally boxplot(total.cost.6mo) median(total.cost.6mo) (You could elaborate by converting months.post.index with cut() and use lapply(names(period),.) to give you a list of tables, which boxplot() might actually know how to plot directly.) -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] data manipulation
Hi, I am new to R, coming from a few years using Stata. I've been twisting my brain and checking several R and S references over the last few days to try to solve this data management problem: I have a data set with a unique patient identifier that is repeated along multiple rows, a variable with month of patient encounter, and a continous variable for cost of individual encounters. The data looks like this: ID datecost 1 "2001-01" 200.00 1 "2001-01" 123.94 1 "2001-03" 100.23 1 "2001-04" 150.34 2 "2001-03" 296.34 2 "2002-05" 156.36 I would like to obtain the median costs and boxplots for the sum of encounters happening in the first six months after the index encounter (first patient encounter) for each patient, then the mean and median costs for the costs happening from 6 to 12 months after the index encounter, and so on. Notice that the first ID has two encounters during the index date, making it more difficult to define a single row with the index encounter. Any help would be appreciated, Ricardo Ricardo Pietrobon, MD Assistant Professor of Surgery Duke University Medical Center Durham, NC 27710 US __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] data manipulation: getting mean value every 5 rows
Dear All, thanks for exceptional and speedy help. In particular, thanks to J. R. Lockwood, Sue Paul, Spencer Graves, Dennis J. Murphy and Tony Plate. regards, Federico Calboli = Federico C.F. Calboli Department of Biology University College London Room 327 Darwin Building Gower Street London WClE 6BT Tel: (+44) 020 7679 4395 Fax (+44) 020 7679 7096 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] data manipulation: getting mean value every 5 rows
> x <- read.table(file("clipboard"), header=T) > # add an extra field to define groups of 5 sequential rows > x[,"code"] <- rep(seq(len=nrow(x)/5), each=5) > x temp line cage number code 118 181 6678.6301 218 181 7774.4581 318 181 7845.9021 418 181 9483.5781 518 181 8983.5551 618 181 9181.0522 718 181 9458.6962 818 181 8138.6162 918 181 7981.9942 10 18 181 7556.4912 11 18 181 7672.1373 12 18 181 6607.7763 13 18 181 8383.6503 14 18 181 7129.8523 15 18 181 8536.6673 16 18 182 8287.8004 17 18 182 7924.4704 18 18 182 7928.4744 19 18 182 7363.1574 20 18 182 7952.5934 > aggregate(x[,"number",drop=F], x[,c("temp", "line", "cage", "code")], mean) temp line cage code number 1 18 1811 8153.225 2 18 1812 8463.370 3 18 1813 7666.016 4 18 1824 7891.299 > # result has an additional column named "code" -- easily eliminated At Monday 10:47 PM 7/28/2003 +0100, you wrote: Dear All, I would like to ask you how to accomplish a little tricky data manipulation. I have a large dataset, looking something like: templinecagenumber 18 18 1 6678.63 18 18 1 7774.458 18 18 1 7845.902 18 18 1 9483.578 18 18 1 8983.555 18 18 1 9181.052 18 18 1 9458.696 18 18 1 8138.616 18 18 1 7981.994 18 18 1 7556.491 18 18 1 7672.137 18 18 1 6607.776 18 18 1 8383.65 18 18 1 7129.852 18 18 1 8536.667 18 18 2 8287.8 18 18 2 7924.47 18 18 2 7928.474 18 18 2 7363.157 18 18 2 7952.593 . I would like to create a dataframe where I get the mean values, 5 rows at a time, of columns "number", while keeping the value in the other columns fixed to the vaules found in the first of the 5 rows (or whatever, it's the same for the 5 rows) so that the above would be "shrunk" to: templinecagenumber 18 18 1 8153.2246 18 18 1 8463.3698 18 18 1 7666.0164 18 18 2 7891.2988 Any hints? Regards, Federico Calboli = Federico C.F. Calboli Department of Biology University College London Room 327 Darwin Building Gower Street London WClE 6BT Tel: (+44) 020 7679 4395 Fax (+44) 020 7679 7096 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help Tony Plate [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] data manipulation: getting mean value every 5 rows
Have you considered "aggregate" [documented in help(aggregate) or "www.r-project.org" -> search -> "R site search" or Venables and Ripley, Modern Applied Statistics with S]? hope this helps. spencer graves Federico Calboli wrote: Dear All, I would like to ask you how to accomplish a little tricky data manipulation. I have a large dataset, looking something like: templinecagenumber 18 18 1 6678.63 18 18 1 7774.458 18 18 1 7845.902 18 18 1 9483.578 18 18 1 8983.555 18 18 1 9181.052 18 18 1 9458.696 18 18 1 8138.616 18 18 1 7981.994 18 18 1 7556.491 18 18 1 7672.137 18 18 1 6607.776 18 18 1 8383.65 18 18 1 7129.852 18 18 1 8536.667 18 18 2 8287.8 18 18 2 7924.47 18 18 2 7928.474 18 18 2 7363.157 18 18 2 7952.593 . I would like to create a dataframe where I get the mean values, 5 rows at a time, of columns "number", while keeping the value in the other columns fixed to the vaules found in the first of the 5 rows (or whatever, it's the same for the 5 rows) so that the above would be "shrunk" to: temp line cage number 18 18 1 8153.2246 18 18 1 8463.3698 18 18 1 7666.0164 18 18 2 7891.2988 Any hints? Regards, Federico Calboli = Federico C.F. Calboli Department of Biology University College London Room 327 Darwin Building Gower Street London WClE 6BT Tel: (+44) 020 7679 4395 Fax (+44) 020 7679 7096 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] data manipulation: getting mean value every 5 rows
Dear All, I would like to ask you how to accomplish a little tricky data manipulation. I have a large dataset, looking something like: templinecagenumber 18 18 1 6678.63 18 18 1 7774.458 18 18 1 7845.902 18 18 1 9483.578 18 18 1 8983.555 18 18 1 9181.052 18 18 1 9458.696 18 18 1 8138.616 18 18 1 7981.994 18 18 1 7556.491 18 18 1 7672.137 18 18 1 6607.776 18 18 1 8383.65 18 18 1 7129.852 18 18 1 8536.667 18 18 2 8287.8 18 18 2 7924.47 18 18 2 7928.474 18 18 2 7363.157 18 18 2 7952.593 . I would like to create a dataframe where I get the mean values, 5 rows at a time, of columns "number", while keeping the value in the other columns fixed to the vaules found in the first of the 5 rows (or whatever, it's the same for the 5 rows) so that the above would be "shrunk" to: templinecagenumber 18 18 1 8153.2246 18 18 1 8463.3698 18 18 1 7666.0164 18 18 2 7891.2988 Any hints? Regards, Federico Calboli = Federico C.F. Calboli Department of Biology University College London Room 327 Darwin Building Gower Street London WClE 6BT Tel: (+44) 020 7679 4395 Fax (+44) 020 7679 7096 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] data manipulation function descriptions
On Fri, 14 Feb 2003 [EMAIL PROTECTED] wrote: > On Thu, 13 Feb 2003, kjetil brinchmann halvorsen wrote: > > > On 13 Feb 2003 at 17:09, Jason Bond wrote: > > > > case switch > > [R-core : switch should be better > >announced. It is for > > instance not > > mentioned in "An > > introduction to R"] > > Well, that is an *introduction*, not a programmer's guide. You will find > switch() is rarely used in R: it is a bit peculiar in its semantics, and > something definitely not to be considered introductory. > > On the original question, I think it would be a mistake to translate what > you know. R is a vector language, not a pairlist language, and I see > quite a bit of evidence of convoluted solutions in its internals dating > from when R was the second. Chapter 2 of Venables & Ripley (2002) (as in > the R FAQ) is devoted to using S/R for data manipulation. As someone reasonably familiar with both languages I have to disagree with several points here. First and foremost, despite differences in surface syntax, as languages xlispstat and R are much more alike than they are different. xlispstat is much closer to R than S-plus because both xlispstat and R use lexical scope, a feature of R that is still not used as much as it could be. The main language differences are the limited form of lazy evaluation used in R, which you can usully ignore, and the fact that R does not provide mutable data structures, which is also rarely an issue. There are other differences, but these are the main ones that affect coding practices I think. The basic xlispstat data handling functions mentioned in the original post are quite similar to corresponding basic functions in R. This is not by accident: the choice of functions included in xlispstat was heavily influenced by what was then called the "New S" language. As a result, if you want to create an R version of an xlispstat function you can often do far worse than start with a fairly direct transliteration. In my view at least, good coding practices in xlispstat are good coding practices for any high level mostly functional language and carry over quite well to R. I am sorry if the following seems a bit harsh, but I, and many others who have worked with lisp, find it extremely frustrating to read statements about lisp like the one above that suggest that lisp is a pairlist language only, especially when these statements come from people I thought knew better. Lisp dates back to the 1950's. The only other language of any consequence still in use from that era is FORTRAN. No one would now claim that a major flaw in FORTAN is the lack of an if-then-else construct. That was true in the early days but has not been for several decades. But for some reason many people seem very happy to very authoritatively make statements about lisp that, if they were ever true at all, have not been so for a very long time indeed. Pairlists are a very useful data structure for expressing many algorithms in a functional style. That is why they were one of the first data structures in Lisp, and that is why they are available in virtually all other high level functional languages (ML, Haskell, Miranda, Clean, ...). Pailrists are NOT the only data structure in Lisp. For many years Lisp has also supported vectors and arrays, both generic and typed (and other data structures). Vectors and pairlists are collectively referred to as sequences, and, if I remember correctly, all the functions listed in the original post except mapcar are designed to work on all kinds of sequences (the sequence version of mapcar is map). Code written in xlispstat in terms of sequence functions can often be translated quite easily to R, and the resulting code will be quite consistent with good R coding practices. R does not provide a pairlist data structure. This creates a dilemma when translating some list-based xlispstat code, or, more importantly, when implementing an algorithm for which parilists are the natural data structure to use. There are two choices: use a vector based algorithm that may be a bit less natural but fits better with the basic R data structures, or build your own pairlist abstraction for this particular problem and write the algorithm the more natural way. I have used both approaches on different occasions. I usually prefer to write an algorithm in the most natural way for the algorithm, since that usually maximizes the probability that my code is actually correct. If this approach requires some additional abstract data types, be they pairlists or anything else, then I develop and test them separately and write the main code in terms of these abstractions. Occasianally, but not all that often, this results in code th
Re: [R] data manipulation function descriptions
On Thu, 13 Feb 2003, kjetil brinchmann halvorsen wrote: > On 13 Feb 2003 at 17:09, Jason Bond wrote: > > case switch > [R-core : switch should be better >announced. It is for > instance not > mentioned in "An > introduction to R"] Well, that is an *introduction*, not a programmer's guide. You will find switch() is rarely used in R: it is a bit peculiar in its semantics, and something definitely not to be considered introductory. On the original question, I think it would be a mistake to translate what you know. R is a vector language, not a pairlist language, and I see quite a bit of evidence of convoluted solutions in its internals dating from when R was the second. Chapter 2 of Venables & Ripley (2002) (as in the R FAQ) is devoted to using S/R for data manipulation. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] data manipulation function descriptions
On 13 Feb 2003 at 17:09, Jason Bond wrote: As lisp-stat user, I tried to compile a short dictionary within your answer below: > Hello, > >I'm a recovering xlispstat user, and am trying to become a good R > user. I've looked around on the CRAN doc website and have found quite a > few sets of documentation with various level of data manipulation function > descriptions (of what I've seen, most relatively low levels), and many with > examples of Rs use in statistical analyses. Although I don't expect to get > my wish, ideally, it would be nice to have some sort of data manipulation > function guide for programmers. I guess I'm somewhat of a different case, > as I know which functions that I want to use...I just don't know their > names...for example, all those great xlispstat functions like: > > remove-duplicatesmore or lessunique() > sort-data " sort() > combine c() > remove x <- c(1,2,3,5,7,9,12,15, 18, 22) x[-which(x==15)] > reverse rev > butlast n <- length(x) x[-n] > first x[1] or for a list x[[1]] > case switch [R-core : switch should be better announced. It is for instance not mentioned in "An introduction to R"] > which which > mapcar apply, lapply, sapply > map-elements nothing better than the ones above > all the string functions paste, strwidth, strwrap, substr, toString > and many many more,\ Kjetil Halvorsen > > descriptions of a few of which are spread out in various documents. Part > of my problem is clinging to that which I know. Anyway, any general advice > would be greatly appreciated. > >Jason > > At 03:57 PM 2/13/03 -0600, you wrote: > > >?which > > > >On Thursday 13 February 2003 03:40 pm, Jason Bond wrote: > > > Hello. Sorry for the elementary post. I've looked through the > > > documentation, but can't seem to find a function which allows one to > > > extract the position of an element within a list...for example the position > > > of the element 4 in the vector c(1,2,4,3,6) is 3. Thanks much for any > > > help. > > > > > >Jason > > > > > > __ > > > [EMAIL PROTECTED] mailing list > > > http://www.stat.math.ethz.ch/mailman/listinfo/r-help > > __ > [EMAIL PROTECTED] mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] data manipulation function descriptions
Hello, I'm a recovering xlispstat user, and am trying to become a good R user. I've looked around on the CRAN doc website and have found quite a few sets of documentation with various level of data manipulation function descriptions (of what I've seen, most relatively low levels), and many with examples of Rs use in statistical analyses. Although I don't expect to get my wish, ideally, it would be nice to have some sort of data manipulation function guide for programmers. I guess I'm somewhat of a different case, as I know which functions that I want to use...I just don't know their names...for example, all those great xlispstat functions like: remove-duplicates sort-data combine remove reverse butlast first case which mapcar map-elements all the string functions and many many more, descriptions of a few of which are spread out in various documents. Part of my problem is clinging to that which I know. Anyway, any general advice would be greatly appreciated. Jason At 03:57 PM 2/13/03 -0600, you wrote: ?which On Thursday 13 February 2003 03:40 pm, Jason Bond wrote: > Hello. Sorry for the elementary post. I've looked through the > documentation, but can't seem to find a function which allows one to > extract the position of an element within a list...for example the position > of the element 4 in the vector c(1,2,4,3,6) is 3. Thanks much for any > help. > >Jason > > __ > [EMAIL PROTECTED] mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Data manipulation
Dear Lew, You could use the subset argument to lm: knap.fit1 <- lm(Kweed ~ TREAT, data=knap, subset=c(41:60,81:100,101:120,121:140)) (You could alternatively subscript both Kweed and TREAT, rather than just TREAT, but this is unnecessarily complicated; as well, you'd need to use c() within the subscript, as in Kweed[c(41:60,81:100,101:120,121:140)].) John At 03:36 PM 2/7/2003 -0700, Lew wrote: I am interested in building a model with a subset of data from a column. The first 6 lines of my data look like this: QUAD YEAR SITE TREAT HERB TILL PLANT SEED Kweed 1 A4 2002s 1NN NN 55.00 2A10 2002s 1NN NN 60.00 3 B2 2002s 1NN NN 35.00 4 C2 2002s 1NN NN 23.00 5 C9 2002s 1NN NN 70.00 6 11 2002m 1NN NN 22.00 I tried this command to get the subset I want: > knap.fit1<-(lm(Kweed~TREAT[41:60,81:100,101:120,121:140], data=knap)) No luck. Can anyone tell me how to code for this subset. - John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: [EMAIL PROTECTED] phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox - __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Data manipulation
You might want to try subsetting the data frame first, and then fit the model. Something like knap.sub <- knap[c(41:60,81:100,101:120,121:140), ] knap.fit1 <- lm(Kweed ~ TREAT, data = knap.sub) might work for you. -roger ___ UCLA Department of Statistics [EMAIL PROTECTED] http://www.stat.ucla.edu/~rpeng On Fri, 7 Feb 2003, Lew wrote: > I am interested in building a model with a subset of data from a column. > > The first 6 lines of my data look like this: > QUAD YEAR SITE TREAT HERB TILL PLANT SEED Kweed > 1 A4 2002s 1NN NN 55.00 > 2A10 2002s 1NN NN 60.00 > 3 B2 2002s 1NN NN 35.00 > 4 C2 2002s 1NN NN 23.00 > 5 C9 2002s 1NN NN 70.00 > 6 11 2002m 1NN NN 22.00 > > I tried this command to get the subset I want: > > > knap.fit1<-(lm(Kweed~TREAT[41:60,81:100,101:120,121:140], data=knap)) > No luck. > > Can anyone tell me how to code for this subset. > > Thanks > > Lew Stringer > M.S. Student- Land Rehabilitation > Dept. of Land Resources and Environmental Sciences > Montana State University > 822 Leon Johnson Hall > Bozeman, MT 59717 > Lab:(406)994-6811 > Fax:(406)994-3933 > > __ > [EMAIL PROTECTED] mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help > __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Data manipulation
I am interested in building a model with a subset of data from a column. The first 6 lines of my data look like this: QUAD YEAR SITE TREAT HERB TILL PLANT SEED Kweed 1 A4 2002s 1NN NN 55.00 2A10 2002s 1NN NN 60.00 3 B2 2002s 1NN NN 35.00 4 C2 2002s 1NN NN 23.00 5 C9 2002s 1NN NN 70.00 6 11 2002m 1NN NN 22.00 I tried this command to get the subset I want: > knap.fit1<-(lm(Kweed~TREAT[41:60,81:100,101:120,121:140], data=knap)) No luck. Can anyone tell me how to code for this subset. Thanks Lew Stringer M.S. Student- Land Rehabilitation Dept. of Land Resources and Environmental Sciences Montana State University 822 Leon Johnson Hall Bozeman, MT 59717 Lab:(406)994-6811 Fax:(406)994-3933 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help