Re: [R] data$ID -> I always get a NULL
This is my result: > class(data) [1] "data.frame" > str(data) 'data.frame': 2193 obs. of 83 variables: $ X.ID. : Factor w/ 2193 levels "'18201'",..: 1 2 3 4 5 6 7 8 9 10 ... $ X.kod. : Factor w/ 20 levels "'01'","'02'",..: 1 1 1 1 1 1 1 1 1 1 ... $ X.wiel. : int 7 7 7 7 7 7 7 8 8 8 ... $ X.piech. : num 1 99.9 4 0.5 4 2 99.9 2 2 99.9 ... $ X.rodz. : int NA 2 4 NA 4 2 2 3 2 NA ... David Winsemius wrote: > > > On Apr 19, 2009, at 6:45 PM, Grześ wrote: > >> >> I have database write as .csv file. > > The external sorage format is not likely to be relevant. What might be > informative would be to produce the code that reads this file. >> >> When I want to get sth from my database I get NULL, but I know that >> there is >> sth! >> For example: >> >>> data$ID >> NULL >>> data$kod >> NULL >> >> but command like below is always recognize by R >>> data[2,3] >> [1] '082' > > Tell is what happens when you enter: > > str(data) > class(data) > > Perhaps the third column is not named "ID" or "kod" or the object is > not a data.frame, but is rather a matrix. > > -- > David Winsemius >> >> >> In my opinion this problem is also connect with my attempt to create >> a tree. >> I always get errors. >> >>> t.tree0=rpart(ID~.,t.train) >> Error in eval(expr, envir, enclos) : object "ID" not found >> >>> t.tree0=rpart(kod~.,t.train) >> Error in eval(expr, envir, enclos) : object "kod" not found >> >> What I should do to create my simple trees? >> -- >> View this message in context: >> http://www.nabble.com/data%24ID--%3E-I-always-get-a-NULL-tp23128214p23128214.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/data%24ID--%3E-I-always-get-a-NULL-tp23128214p23132506.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data$ID -> I always get a NULL
On Apr 19, 2009, at 6:45 PM, Grześ wrote: I have database write as .csv file. The external sorage format is not likely to be relevant. What might be informative would be to produce the code that reads this file. When I want to get sth from my database I get NULL, but I know that there is sth! For example: data$ID NULL data$kod NULL but command like below is always recognize by R data[2,3] [1] '082' Tell is what happens when you enter: str(data) class(data) Perhaps the third column is not named "ID" or "kod" or the object is not a data.frame, but is rather a matrix. -- David Winsemius In my opinion this problem is also connect with my attempt to create a tree. I always get errors. t.tree0=rpart(ID~.,t.train) Error in eval(expr, envir, enclos) : object "ID" not found t.tree0=rpart(kod~.,t.train) Error in eval(expr, envir, enclos) : object "kod" not found What I should do to create my simple trees? -- View this message in context: http://www.nabble.com/data%24ID--%3E-I-always-get-a-NULL-tp23128214p23128214.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data$ID -> I always get a NULL
I have database write as .csv file. When I want to get sth from my database I get NULL, but I know that there is sth! For example: > data$ID NULL > data$kod NULL but command like below is always recognize by R > data[2,3] [1] '082' In my opinion this problem is also connect with my attempt to create a tree. I always get errors. > t.tree0=rpart(ID~.,t.train) Error in eval(expr, envir, enclos) : object "ID" not found > t.tree0=rpart(kod~.,t.train) Error in eval(expr, envir, enclos) : object "kod" not found What I should do to create my simple trees? -- View this message in context: http://www.nabble.com/data%24ID--%3E-I-always-get-a-NULL-tp23128214p23128214.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame display
Also consider the "View" function for looking at the dataframe On Wed, Apr 15, 2009 at 10:13 AM, Vladan Arsenijevic wrote: > Hi all, > > I am dealing with a big data frame. When printing something like > >> allData[[3]] > > 1 625.364 38.223 21.014 0.216 1.241411 V 1050o 58.38065 -0.06178768 > 2 383.709 55.811 21.435 0.296 1.241411 V 1050o 58.38308 -0.03328282 > 3 434.669 58.597 21.207 0.233 1.241411 V 1050o 58.38334 -0.03930350 > 4 687.306 69.418 20.873 0.171 1.241411 V 1050o 58.38425 -0.06914694 > 5 759.522 104.019 22.473 0.685 1.241411 V 1050o 58.38824 -0.07772423 > > 1 58.43595 -0.04950218 > 2 58.43595 -0.04950218 > 3 58.43595 -0.04950218 > 4 58.43595 -0.04950218 > 5 58.43595 -0.04950218 > > I get the following. Oddly, the output looks like a word wrap was performed > and is the same whether I run R from emacs or terminal. Since I want to > print the whole data frame, I need some tips to solve this format problem. > > Cheers! > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame display
Have a look at the width argument in ?options HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Vladan Arsenijevic Verzonden: woensdag 15 april 2009 16:13 Aan: r-help@r-project.org Onderwerp: [R] data frame display Hi all, I am dealing with a big data frame. When printing something like > allData[[3]] 1 625.364 38.223 21.014 0.216 1.241411V 1050o 58.38065 -0.06178768 2 383.709 55.811 21.435 0.296 1.241411V 1050o 58.38308 -0.03328282 3 434.669 58.597 21.207 0.233 1.241411V 1050o 58.38334 -0.03930350 4 687.306 69.418 20.873 0.171 1.241411V 1050o 58.38425 -0.06914694 5 759.522 104.019 22.473 0.685 1.241411V 1050o 58.38824 -0.07772423 1 58.43595 -0.04950218 2 58.43595 -0.04950218 3 58.43595 -0.04950218 4 58.43595 -0.04950218 5 58.43595 -0.04950218 I get the following. Oddly, the output looks like a word wrap was performed and is the same whether I run R from emacs or terminal. Since I want to print the whole data frame, I need some tips to solve this format problem. Cheers! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data frame display
Hi all, I am dealing with a big data frame. When printing something like allData[[3]] 1 625.364 38.223 21.014 0.216 1.241411V 1050o 58.38065 -0.06178768 2 383.709 55.811 21.435 0.296 1.241411V 1050o 58.38308 -0.03328282 3 434.669 58.597 21.207 0.233 1.241411V 1050o 58.38334 -0.03930350 4 687.306 69.418 20.873 0.171 1.241411V 1050o 58.38425 -0.06914694 5 759.522 104.019 22.473 0.685 1.241411V 1050o 58.38824 -0.07772423 1 58.43595 -0.04950218 2 58.43595 -0.04950218 3 58.43595 -0.04950218 4 58.43595 -0.04950218 5 58.43595 -0.04950218 I get the following. Oddly, the output looks like a word wrap was performed and is the same whether I run R from emacs or terminal. Since I want to print the whole data frame, I need some tips to solve this format problem. Cheers! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data decomposition
Its not clear what it means for the sum of the weeks in a month to equal the month since the weeks don't evenly divide a month but if we apportion them pro rata then here is a possibility. We create the input monthly series z and then produce a series of Date class days d covering the period. Merging the input and d leaves a series with the input values at the first of the month and NAs on each other day. We use ave to distribute that first value among all days of the month and then defining weeks to end in Friday (or the last day of the input even if its not Friday) we use the nextfri function given in the zoo vignette and aggregate over the days in each week. library(zoo) # input z <- zoo(1:12, as.yearmon(2000) + 0:11/12) tt <- time(z) # calculate sequence of Dates rng <- c(as.Date(tt[1]), as.Date(tt[length(z)], frac = 1)) d <- seq(rng[1], rng[2], by = "day") # place input into 1st of each month with NAs for other days m <- merge(aggregate(z, as.Date, force), zoo(, d)) # distribute 1st of month across entire month m0 <- ave(m, as.yearmon(time(m)), FUN = function(x) x[1]/length(x)) # calculate the next fri for each day # This one line function can be found in the zoo-quickref vignette. nextfri <- function(x) 7 * ceiling(as.numeric(x - 5 + 4)/7) + as.Date(5 - 4) w <- pmin(nextfri(time(m0)), rng[2]) # sum all days in each week aggregate(m0, w, sum) On Sun, Mar 29, 2009 at 9:28 AM, Pele wrote: > > Hi R users, > > I have a time series variable that is only available at a monthly level for > 1 years that I need to decompose to a weekly time series level - can > anyone recommend a R function that I can use to decompose this series? > > eg. if month1 = 1200 I would to decompose so that the sum of the weeks for > month1 equals 1200, etc.. > > Many thanks in advance for any help. > -- > View this message in context: > http://www.nabble.com/Data-decomposition-tp22767614p22767614.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data decomposition
Hi R users, I have a time series variable that is only available at a monthly level for 1 years that I need to decompose to a weekly time series level - can anyone recommend a R function that I can use to decompose this series? eg. if month1 = 1200 I would to decompose so that the sum of the weeks for month1 equals 1200, etc.. Many thanks in advance for any help. -- View this message in context: http://www.nabble.com/Data-decomposition-tp22767614p22767614.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation - multiplicate cases
Here is a suggestion. Let your data frame be 'dat': > dat X YZ 123 31 234 31 345 42 456 32 Try this: bigData <- data.frame(with(dat, rbind(cbind(X = rep(X, Z), Y = rep(Y,Y), Z = 1), cbind(X = rep(X, Y-Z), Y = rep(Y, Y-Z), Z = 0 bigData <- with(bigData, bigData[order(X, -Y), ]) Bill Venables. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of MarcioRibeiro [mes...@pop.com.br] Sent: 27 March 2009 06:26 To: r-help@r-project.org Subject: [R] Data manipulation - multiplicate cases Hi listers, I am trying to arrange my data and I didn't find any information how to do it! I have a data with 3 variables: X Y Z 1-I would like to multiplicate de information of X according to the number I have for my Y variable... 2-Then I want to identify with a dicotomic variable by the number according my variable Z from X... I can do the first part by... z<-rep(x,y) But I don't know how to set a dicotomic variable according to Z... Exemple... I have... X YZ 123 31 234 31 345 42 456 32 I want to get... X YZ 123 31 123 30 123 30 234 31 234 30 234 30 345 41 345 41 345 40 345 40 456 31 456 31 456 30 Thanks in advance... -- View this message in context: http://www.nabble.com/Data-manipulation---multiplicate-cases-tp22730453p22730453.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation - multiplicate cases
Is this what you are looking for: > x X Y Z 1 123 3 1 2 234 3 1 3 345 4 2 4 456 3 2 > new.x <- x[rep(seq(nrow(x)), times=x$Y),] > new.x X Y Z 1 123 3 1 1.1 123 3 1 1.2 123 3 1 2 234 3 1 2.1 234 3 1 2.2 234 3 1 3 345 4 2 3.1 345 4 2 3.2 345 4 2 3.3 345 4 2 4 456 3 2 4.1 456 3 2 4.2 456 3 2 > new.x$Z <- ave(new.x$Z, new.x$X, FUN=function(z) c(rep(1,z[1]), rep(0, > length(z) - z[1]))) > new.x X Y Z 1 123 3 1 1.1 123 3 0 1.2 123 3 0 2 234 3 1 2.1 234 3 0 2.2 234 3 0 3 345 4 1 3.1 345 4 1 3.2 345 4 0 3.3 345 4 0 4 456 3 1 4.1 456 3 1 4.2 456 3 0 > On Thu, Mar 26, 2009 at 4:26 PM, MarcioRibeiro wrote: > > Hi listers, > I am trying to arrange my data and I didn't find any information how to do > it! > I have a data with 3 variables: X Y Z > 1-I would like to multiplicate de information of X according to the number I > have for my Y variable... > 2-Then I want to identify with a dicotomic variable by the number according > my variable Z from X... > I can do the first part by... > z<-rep(x,y) > But I don't know how to set a dicotomic variable according to Z... > Exemple... > I have... > X Y Z > 123 3 1 > 234 3 1 > 345 4 2 > 456 3 2 > I want to get... > X Y Z > 123 3 1 > 123 3 0 > 123 3 0 > 234 3 1 > 234 3 0 > 234 3 0 > 345 4 1 > 345 4 1 > 345 4 0 > 345 4 0 > 456 3 1 > 456 3 1 > 456 3 0 > > Thanks in advance... > -- > View this message in context: > http://www.nabble.com/Data-manipulation---multiplicate-cases-tp22730453p22730453.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data manipulation - multiplicate cases
Hi listers, I am trying to arrange my data and I didn't find any information how to do it! I have a data with 3 variables: X Y Z 1-I would like to multiplicate de information of X according to the number I have for my Y variable... 2-Then I want to identify with a dicotomic variable by the number according my variable Z from X... I can do the first part by... z<-rep(x,y) But I don't know how to set a dicotomic variable according to Z... Exemple... I have... X YZ 123 31 234 31 345 42 456 32 I want to get... X YZ 123 31 123 30 123 30 234 31 234 30 234 30 345 41 345 41 345 40 345 40 456 31 456 31 456 30 Thanks in advance... -- View this message in context: http://www.nabble.com/Data-manipulation---multiplicate-cases-tp22730453p22730453.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame to array
Here is a possibility Dat <- cbind(expand.grid(f1 = letters[1:5], f2 = LETTERS[1:5], f3 = as.character(1:5)), x = rnorm(125)) M <- with(Dat, { f23 <- f2:f3 m <- matrix(0, length(levels(f1)), length(levels(f23))) i <- match(f1, levels(f1)) j <- match(f23, levels(f23)) m[cbind(i,j)] <- x dimnames(m) <- list(levels(f1), levels(f23)) m }) > M[1:5, 1:5] A:1A:2A:3A:4A:5 a 1.72686085 -2.0605242 1.0989119 0.8096139 1.0146972 b -0.34512446 -0.1709805 0.3401842 0.5815685 -1.4862872 c 1.14489491 -0.3959085 0.3222197 -1.1108793 0.3676764 d 0.02520386 -1.0018102 -0.7232067 -0.6142914 0.6694813 e -1.23366653 0.3826862 -0.6797035 0.6536055 0.8865669 This should work provided you have one entry per f1 x f2 x f3 cell. The rows of the data frame may be in arbitrary order. Bill Venables http://www.cmis.csiro.au/bill.venables/ -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Thomas S. Dye Sent: Monday, 23 March 2009 7:36 AM To: r-help@r-project.org Subject: [R] data frame to array Aloha all, I have a data frame with 4 columns. The first three are factors (f1, f2, f3) and the fourth is numeric. I'd like to explore these data using median polish. To do that I plan to use medpolish() on the matrix[f1,f2xf3], then medpolish on the resulting matrix[f2,f3]. This approach is described by Cook on page 141 of Exploring Data Tables, Trends, and Shapes. split() gets me close to where I want to be, but results in a list, rather than a matrix. How do I construct the matrix[f1,f2xf3] from my data frame? Also, any pointers to existing code that performs multi-way median polish will be appreciated. Sorry for the newbie-type query, but manipulating data prior to analysis is really hard for me in R. All the best, Tom Thomas S. Dye, Ph.D. T. S. Dye & Colleagues, Archaeologists, Inc. Phone: (808) 529-0866 Fax: (808) 529-0884 http://www.tsdye.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data frame to array
Aloha all, I have a data frame with 4 columns. The first three are factors (f1, f2, f3) and the fourth is numeric. I'd like to explore these data using median polish. To do that I plan to use medpolish() on the matrix[f1,f2xf3], then medpolish on the resulting matrix[f2,f3]. This approach is described by Cook on page 141 of Exploring Data Tables, Trends, and Shapes. split() gets me close to where I want to be, but results in a list, rather than a matrix. How do I construct the matrix[f1,f2xf3] from my data frame? Also, any pointers to existing code that performs multi-way median polish will be appreciated. Sorry for the newbie-type query, but manipulating data prior to analysis is really hard for me in R. All the best, Tom Thomas S. Dye, Ph.D. T. S. Dye & Colleagues, Archaeologists, Inc. Phone: (808) 529-0866 Fax: (808) 529-0884 http://www.tsdye.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data analysis. R
thx for ur fast responds. but sorry for asking stupid, i am a turn beginner of R (just trying it out <3 months, and i am taking my first course about it) so, to tackle this questions, i was told to use "nested design" method, could you actually show me how would u attempt this problem? (a) Determine if insulation in the house effects the average gas consumption. (b) How much extra gas is used when there is no insulation? Provide an interval estimate as well as a point estimate. i just got confused by the backgroud information. "We are interested in looking at the effect of insulation on gas consumption. The average outside temperature (degrees celcius) was also measured." so how should my model looks like? i dont even know what should be my explanatory/response variables... thx in advance Gabor Grothendieck wrote: > > This works with the example. If the real data is different it may not > work. To run the example below just copy and paste it into R. > To run with the real data replace textConnection(Lines) with > "insulation.txt" everywhere. > > Lines <- "Before insulAfter insul. > tempgas tempgas > -0.87.2-0.74.8 > -0.76.90.84.6 > 0.46.41.04.7 > 2.56.01.44.0 > 2.95.81.54.2 > 3.25.81.64.2 > 3.65.62.34.1 > 3.94.72.54.0 > 4.25.82.53.5 > 4.35.23.13.2 > 5.44.93.93.9 > 6.04.94.03.5 > 6.04.34.03.7 > 6.04.44.23.5 > 6.24.54.33.5 > 6.34.64.63.7 > 6.93.74.73.5 > 7.03.94.93.4 > 7.44.24.93.7 > 7.54.04.94.0 > 7.53.95.03.6 > 7.63.55.33.7 > 8.04.06.22.8 > 8.53.67.13.0 > 9.13.17.22.8 > 10.2 2.67.52.6 >8.02.7 >8.72.8 >8.81.3 >9.71.5" > > nfld <- count.fields(textConnection(Lines)) > data.lines <- readLines(textConnection(Lines)) > data.lines <- ifelse(nfld == 2, paste("NA NA", data.lines), data.lines) > my.data <- read.table(textConnection(data.lines), header = TRUE, skip = 1) > > > > > On Sat, Mar 21, 2009 at 8:13 PM, UBC wrote: >> >> so i am having this question >> what should i do if the give data file (.txt) has 4 columns, but >> different >> lengths? >> how can i read them in R? >> any idea for the following problem? >> >> >> Gas consumption (1000 cubic feet) was measured before and after >> insulation >> was put into >> a house. We are interested in looking at the effect of insulation on gas >> consumption. The >> average outside temperature (degrees celcius) was also measured. The data >> are included in >> the file "insulation.txt". >> >> (a) Determine if insulation in the house effects the average gas >> consumption. >> (b) How much extra gas is used when there is no insulation? Provide an >> interval estimate >> as well as a point estimate. >> >> heres the content in "insulation.txt" (u can just copy and paste it to >> the >> notepad so can be read in R) >> >> Before insul After insul. >> temp gas temp gas >> -0.8 7.2 -0.7 4.8 >> -0.7 6.9 0.8 4.6 >> 0.4 6.4 1.0 4.7 >> 2.5 6.0 1.4 4.0 >> 2.9 5.8 1.5 4.2 >> 3.2 5.8 1.6 4.2 >> 3.6 5.6 2.3 4.1 >> 3.9 4.7 2.5 4.0 >> 4.2 5.8 2.5 3.5 >> 4.3 5.2 3.1 3.2 >> 5.4 4.9 3.9 3.9 >> 6.0 4.9 4.0 3.5 >> 6.0 4.3 4.0 3.7 >> 6.0 4.4 4.2 3.5 >> 6.2 4.5 4.3 3.5 >> 6.3 4.6 4.6 3.7 >> 6.9 3.7 4.7 3.5 >> 7.0 3.9 4.9 3.4 >> 7.4 4.2 4.9 3.7 >> 7.5 4.0 4.9 4.0 >> 7.5 3.9 5.0 3.6 >> 7.6 3.5 5.3 3.7 >> 8.0 4.0 6.2 2.8 >> 8.5 3.6 7.1 3.0 >> 9.1 3.1 7.2 2.8 >> 10.2 2.6 7.5 2.6 >> 8.0 2.7 >> 8.7 2.8 >> 8.8 1.3 >> 9.7 1.5 >> >> >> >> thx and any ideas would help. >> -- >> View this message in context: >> http://www.nabble.com/data-analysis.-R-tp22641912p22641912.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/data-analysis.-R-tp22641912p22643290.html Sent from the R help mailing list archive at Nabble.com. _
Re: [R] data analysis. R
On Sat, Mar 21, 2009 at 5:13 PM, UBC wrote: > > so i am having this question > what should i do if the give data file (.txt) has 4 columns, but different > lengths? > how can i read them in R? > any idea for the following problem? > > > Gas consumption (1000 cubic feet) was measured before and after insulation > was put into > a house. We are interested in looking at the effect of insulation on gas > consumption. The > average outside temperature (degrees celcius) was also measured. The data > are included in > the file "insulation.txt". > > (a) Determine if insulation in the house effects the average gas > consumption. > (b) How much extra gas is used when there is no insulation? Provide an > interval estimate > as well as a point estimate. > > heres the content in "insulation.txt" (u can just copy and paste it to the > notepad so can be read in R) > > Before insul After insul. > temp gas temp gas > -0.8 7.2 -0.7 4.8 > -0.7 6.9 0.8 4.6 > 0.4 6.4 1.0 4.7 > 2.5 6.0 1.4 4.0 > 2.9 5.8 1.5 4.2 > 3.2 5.8 1.6 4.2 > 3.6 5.6 2.3 4.1 > 3.9 4.7 2.5 4.0 > 4.2 5.8 2.5 3.5 > 4.3 5.2 3.1 3.2 > 5.4 4.9 3.9 3.9 > 6.0 4.9 4.0 3.5 > 6.0 4.3 4.0 3.7 > 6.0 4.4 4.2 3.5 > 6.2 4.5 4.3 3.5 > 6.3 4.6 4.6 3.7 > 6.9 3.7 4.7 3.5 > 7.0 3.9 4.9 3.4 > 7.4 4.2 4.9 3.7 > 7.5 4.0 4.9 4.0 > 7.5 3.9 5.0 3.6 > 7.6 3.5 5.3 3.7 > 8.0 4.0 6.2 2.8 > 8.5 3.6 7.1 3.0 > 9.1 3.1 7.2 2.8 > 10.2 2.6 7.5 2.6 > 8.0 2.7 > 8.7 2.8 > 8.8 1.3 > 9.7 1.5 > > > > thx and any ideas would help. Dude- really? This is just a funky-format version of the whiteside data found in the MASS package: library(MASS) whiteside See the posting guide (http://www.r-project.org/posting-guide.html), especially the section on homework questions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data analysis. R
This works with the example. If the real data is different it may not work. To run the example below just copy and paste it into R. To run with the real data replace textConnection(Lines) with "insulation.txt" everywhere. Lines <- "Before insulAfter insul. tempgas tempgas -0.87.2-0.74.8 -0.76.90.84.6 0.46.41.04.7 2.56.01.44.0 2.95.81.54.2 3.25.81.64.2 3.65.62.34.1 3.94.72.54.0 4.25.82.53.5 4.35.23.13.2 5.44.93.93.9 6.04.94.03.5 6.04.34.03.7 6.04.44.23.5 6.24.54.33.5 6.34.64.63.7 6.93.74.73.5 7.03.94.93.4 7.44.24.93.7 7.54.04.94.0 7.53.95.03.6 7.63.55.33.7 8.04.06.22.8 8.53.67.13.0 9.13.17.22.8 10.2 2.67.52.6 8.02.7 8.72.8 8.81.3 9.71.5" nfld <- count.fields(textConnection(Lines)) data.lines <- readLines(textConnection(Lines)) data.lines <- ifelse(nfld == 2, paste("NA NA", data.lines), data.lines) my.data <- read.table(textConnection(data.lines), header = TRUE, skip = 1) On Sat, Mar 21, 2009 at 8:13 PM, UBC wrote: > > so i am having this question > what should i do if the give data file (.txt) has 4 columns, but different > lengths? > how can i read them in R? > any idea for the following problem? > > > Gas consumption (1000 cubic feet) was measured before and after insulation > was put into > a house. We are interested in looking at the effect of insulation on gas > consumption. The > average outside temperature (degrees celcius) was also measured. The data > are included in > the file "insulation.txt". > > (a) Determine if insulation in the house effects the average gas > consumption. > (b) How much extra gas is used when there is no insulation? Provide an > interval estimate > as well as a point estimate. > > heres the content in "insulation.txt" (u can just copy and paste it to the > notepad so can be read in R) > > Before insul After insul. > temp gas temp gas > -0.8 7.2 -0.7 4.8 > -0.7 6.9 0.8 4.6 > 0.4 6.4 1.0 4.7 > 2.5 6.0 1.4 4.0 > 2.9 5.8 1.5 4.2 > 3.2 5.8 1.6 4.2 > 3.6 5.6 2.3 4.1 > 3.9 4.7 2.5 4.0 > 4.2 5.8 2.5 3.5 > 4.3 5.2 3.1 3.2 > 5.4 4.9 3.9 3.9 > 6.0 4.9 4.0 3.5 > 6.0 4.3 4.0 3.7 > 6.0 4.4 4.2 3.5 > 6.2 4.5 4.3 3.5 > 6.3 4.6 4.6 3.7 > 6.9 3.7 4.7 3.5 > 7.0 3.9 4.9 3.4 > 7.4 4.2 4.9 3.7 > 7.5 4.0 4.9 4.0 > 7.5 3.9 5.0 3.6 > 7.6 3.5 5.3 3.7 > 8.0 4.0 6.2 2.8 > 8.5 3.6 7.1 3.0 > 9.1 3.1 7.2 2.8 > 10.2 2.6 7.5 2.6 > 8.0 2.7 > 8.7 2.8 > 8.8 1.3 > 9.7 1.5 > > > > thx and any ideas would help. > -- > View this message in context: > http://www.nabble.com/data-analysis.-R-tp22641912p22641912.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data analysis. R
If the input file has a separator other than a space (e.g., tabs or commas) then you can read it is and the missing data will be NAs and you can decide how to handle it. If it does not have a separator, then maybe you can read it in with read.fwf. Otherwise when you read it in, you can tell the system to 'fill' the missing data, but you don't really know what columns that might be in. So you have some choices; you are able to read in data that may have different lengths in the columns, but if it is ill-structured, it may be difficult to determine how to handle the missing data. On Sat, Mar 21, 2009 at 8:13 PM, UBC wrote: > > so i am having this question > what should i do if the give data file (.txt) has 4 columns, but different > lengths? > how can i read them in R? > any idea for the following problem? > > > Gas consumption (1000 cubic feet) was measured before and after insulation > was put into > a house. We are interested in looking at the effect of insulation on gas > consumption. The > average outside temperature (degrees celcius) was also measured. The data > are included in > the file "insulation.txt". > > (a) Determine if insulation in the house effects the average gas > consumption. > (b) How much extra gas is used when there is no insulation? Provide an > interval estimate > as well as a point estimate. > > heres the content in "insulation.txt" (u can just copy and paste it to the > notepad so can be read in R) > > Before insul After insul. > temp gas temp gas > -0.8 7.2 -0.7 4.8 > -0.7 6.9 0.8 4.6 > 0.4 6.4 1.0 4.7 > 2.5 6.0 1.4 4.0 > 2.9 5.8 1.5 4.2 > 3.2 5.8 1.6 4.2 > 3.6 5.6 2.3 4.1 > 3.9 4.7 2.5 4.0 > 4.2 5.8 2.5 3.5 > 4.3 5.2 3.1 3.2 > 5.4 4.9 3.9 3.9 > 6.0 4.9 4.0 3.5 > 6.0 4.3 4.0 3.7 > 6.0 4.4 4.2 3.5 > 6.2 4.5 4.3 3.5 > 6.3 4.6 4.6 3.7 > 6.9 3.7 4.7 3.5 > 7.0 3.9 4.9 3.4 > 7.4 4.2 4.9 3.7 > 7.5 4.0 4.9 4.0 > 7.5 3.9 5.0 3.6 > 7.6 3.5 5.3 3.7 > 8.0 4.0 6.2 2.8 > 8.5 3.6 7.1 3.0 > 9.1 3.1 7.2 2.8 > 10.2 2.6 7.5 2.6 > 8.0 2.7 > 8.7 2.8 > 8.8 1.3 > 9.7 1.5 > > > > thx and any ideas would help. > -- > View this message in context: > http://www.nabble.com/data-analysis.-R-tp22641912p22641912.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data analysis. R
so i am having this question what should i do if the give data file (.txt) has 4 columns, but different lengths? how can i read them in R? any idea for the following problem? Gas consumption (1000 cubic feet) was measured before and after insulation was put into a house. We are interested in looking at the effect of insulation on gas consumption. The average outside temperature (degrees celcius) was also measured. The data are included in the file "insulation.txt". (a) Determine if insulation in the house effects the average gas consumption. (b) How much extra gas is used when there is no insulation? Provide an interval estimate as well as a point estimate. heres the content in "insulation.txt" (u can just copy and paste it to the notepad so can be read in R) Before insulAfter insul. tempgas tempgas -0.87.2-0.74.8 -0.76.90.84.6 0.46.41.04.7 2.56.01.44.0 2.95.81.54.2 3.25.81.64.2 3.65.62.34.1 3.94.72.54.0 4.25.82.53.5 4.35.23.13.2 5.44.93.93.9 6.04.94.03.5 6.04.34.03.7 6.04.44.23.5 6.24.54.33.5 6.34.64.63.7 6.93.74.73.5 7.03.94.93.4 7.44.24.93.7 7.54.04.94.0 7.53.95.03.6 7.63.55.33.7 8.04.06.22.8 8.53.67.13.0 9.13.17.22.8 10.2 2.67.52.6 8.02.7 8.72.8 8.81.3 9.71.5 thx and any ideas would help. -- View this message in context: http://www.nabble.com/data-analysis.-R-tp22641912p22641912.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Restructuring Question
Stephen - Thanks a bunch for the suggestion to use "reshape". Thank is exactly what I needed. Here is the test code that I put together so far. I can probably get rid of the "time" and "id" columns from the resulting reshaped_test_data data.frame, but this is will allow me to move forward. test_data1_df<-data.frame(Variables=c("Stall","Stall","Stall","Stall","Stall"), Run.Age=c(10, 20, 30, 40, 50), Run.1=c(1,2,3,4,5), Run.2=c(10,20,30,40,50), Run.3=c(11,21,31,41,51), Location=c("HSV", "ATH","HSV", "ATH","FLO")) test_data2_df<-data.frame(Variables=c("Stall","Stall","Stall","Stall","Stall", "Stall","Stall","Stall","Stall","Stall", "Stall","Stall","Stall","Stall","Stall"), Run.Age=c(10, 20, 30, 40, 50, 15, 25, 35, 45, 55, 18, 28, 38, 48, 58), Run.1=c(1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 11, 21, 31, 41, 51), Location=c("HSV", "ATH", "HSV", "ATH", "FLO", "HSV", "ATH", "HSV", "ATH", "FLO", "HSV", "ATH", "HSV", "ATH", "FLO")) test_data_df<-test_data1_df length_test_data<-length(grep('^Run',names(test_data_df))) if(length_test_data==2) { reshaped_test_data<-reshape(test_data_df, varying=list(c('Run.1')), idvar='Location',direction='long') } else if (length_test_data==4){ reshaped_test_data<-reshape(test_data_df, varying=list(c('Run.1','Run.2','Run.3')), #idvar=c('Location','Run.Age'), direction='long') direction='long') } Thanks also to Phil Spector who also provided similar advice. --- On Mon, 3/9/09, stephen sefick wrote: From: stephen sefick Subject: Re: [R] Data Restructuring Question To: jasonkrup...@yahoo.com Cc: R-help@r-project.org Date: Monday, March 9, 2009, 9:00 PM look at package reshape there is a cool little function input that once you get the hang of is handy. On Mon, Mar 9, 2009 at 5:50 PM, Jason Rupert wrote: > I think I am overlooking a call or concept in R to help me easily and quickly restructure my data.frame: > > Sometimes the data I receive looks like: > VariableName, Run1, Run2, Run3, Location > temp, 15.0, 16.0, 17.0, There > > And other times it looks like: > VariableName, Run, Location > temp, 17.0, There > > I would like to use the header information in order to be able to restructure the first data set to have a similar look as the second, i.e. > > VariableName, Run, Location, > temp, 15.0, There # Really Run1 > temp, 16.0, There # Really Run2 > temp, 17.0, There # Really Run3 > > Right now I am manually recombining: > tmp_1<-data.frame(data$VariableName, data$ Run1, data$Location) > tmp_2<-data.frame(data$VariableName, data$ Run2, data$Location) > tmp_3<-data.frame(data$VariableName, data$ Run3, data$Location) > > combine_1<-rbind(tmp_1, tmp_2) > combine_1<-rbind(combine_1, tmp_3) > > Is there an easier way that is more flexible? I would like to make it flexible enough to handle the case where I have two or four runs. > > Thank you for any feedback. > > > > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Restructuring Question
look at package reshape there is a cool little function input that once you get the hang of is handy. On Mon, Mar 9, 2009 at 5:50 PM, Jason Rupert wrote: > I think I am overlooking a call or concept in R to help me easily and quickly > restructure my data.frame: > > Sometimes the data I receive looks like: > VariableName, Run1, Run2, Run3, Location > temp, 15.0, 16.0, 17.0, There > > And other times it looks like: > VariableName, Run, Location > temp, 17.0, There > > I would like to use the header information in order to be able to restructure > the first data set to have a similar look as the second, i.e. > > VariableName, Run, Location, > temp, 15.0, There # Really Run1 > temp, 16.0, There # Really Run2 > temp, 17.0, There # Really Run3 > > Right now I am manually recombining: > tmp_1<-data.frame(data$VariableName, data$ Run1, data$Location) > tmp_2<-data.frame(data$VariableName, data$ Run2, data$Location) > tmp_3<-data.frame(data$VariableName, data$ Run3, data$Location) > > combine_1<-rbind(tmp_1, tmp_2) > combine_1<-rbind(combine_1, tmp_3) > > Is there an easier way that is more flexible? I would like to make it > flexible enough to handle the case where I have two or four runs. > > Thank you for any feedback. > > > > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data Restructuring Question
I think I am overlooking a call or concept in R to help me easily and quickly restructure my data.frame: Sometimes the data I receive looks like: VariableName, Run1, Run2, Run3, Location temp, 15.0, 16.0, 17.0, There And other times it looks like: VariableName, Run, Location temp, 17.0, There I would like to use the header information in order to be able to restructure the first data set to have a similar look as the second, i.e. VariableName, Run, Location, temp, 15.0, There # Really Run1 temp, 16.0, There # Really Run2 temp, 17.0, There # Really Run3 Right now I am manually recombining: tmp_1<-data.frame(data$VariableName, data$ Run1, data$Location) tmp_2<-data.frame(data$VariableName, data$ Run2, data$Location) tmp_3<-data.frame(data$VariableName, data$ Run3, data$Location) combine_1<-rbind(tmp_1, tmp_2) combine_1<-rbind(combine_1, tmp_3) Is there an easier way that is more flexible? I would like to make it flexible enough to handle the case where I have two or four runs. Thank you for any feedback. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data Envelopment Analysis in R
How do I do a data envelopment analysis in R...provide me with the step by step procedure for that..thanks in advance... Arup -- View this message in context: http://www.nabble.com/Data-Envelopment-Analysis-in-R-tp22199360p22199360.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data file - function write.fwf - library gdata
Dear R-list members, I have a data file with thousands of lines (cases), where each line contains the values of several variables. I would like to separate these lines in small groups, with each group followed by a blank line, to ease the visual inspection of the data in some situations. I am writing the output files with function write.fwf in library gdata, for correct column alignment. Below is a small-scale example, which requires library gdata. The results are shown just below the code. In the example, output file 1 gives the correct result I am looking for. But the real production code should be, I think, something like the one that produces output file 2. But that code is not working properly. My questions are: in output file 2, why data that should belong to the same line were written out in different lines? Why wasn't each line written out a whole by the inner for command? What causes the splitting of each line? How to go around this problem? I am using R 2.8.1 running on Windows XP. ###--- ### small-scale example file.1 <- 'test-1.txt' file.2 <- 'test-2.txt' ### this is just to construct a small dataframe x a <- c(1,2,3,4,5,6) b <- c(111,222,333,444,555,666) x <- data.frame(a,b) names(x) <- c('aaa','bbb') space <- data.frame(' ') library(gdata) ### build output file 'test-1.txt' write.fwf(x[0,],file=file.1) # just the header write.fwf(space,file=file.1,colnames=FALSE,append=TRUE) # a blank line write.fwf(x[1:3,],file=file.1,colnames=FALSE,append=TRUE) # two lines write.fwf(space,file=file.1,colnames=FALSE,append=TRUE) # a blank line write.fwf(x[4:6,],file=file.1,colnames=FALSE,append=TRUE) # two lines ### build output file 'test-2.txt' write.fwf(x[0,],file=file.2) # just the header write.fwf(space,file=file.2,colnames=FALSE,append=TRUE) # a blank line for (k in 1:2) { # two groups for (j in 1:3) { # with three lines each write.fwf(x[3*(k-1)+j,],file=file.2,colnames=FALSE,append=TRUE) } # for j write.fwf(space,file=file.2,colnames=FALSE,append=TRUE) # a blank line } # for k ###--- These are the results: Output file test-1.txt (the correct results): aaa bbb 1 111 2 222 3 333 4 444 5 555 6 666 --- Output file test-2.txt (each line has been split in two lines): aaa bbb 1 111 2 222 3 333 4 444 5 555 6 666 --- Thank you very much. Paulo Barata Paulo Barata Fundacao Oswaldo Cruz - Oswaldo Cruz Foundation Rua Leopoldo Bulhoes 1480 - 8A 21041-210 Rio de Janeiro - RJ Brazil E-mail: pbar...@infolink.com.br Alternative e-mail: paulo.bar...@ensp.fiocruz.br __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data format question
Hello all, I have a *.csv file that looks like this (actual file is orders of magnitude larger): site taxa no.ind meadow LMA 2 meadow LCY 1 meadow MSA 2 forest LMA 1 forest LCY 1 forest MSA 1 forest MSX 1 I am interested in, but have failed to create, code that efficiently converts it to a site-by-taxa matrix or data frame that looks like this: LMA LCY MSA MSX Meadow 2 1 2 0 Forest 1 1 1 1 With no repeating taxa names and zeros where a taxon is not listed for a site. Any help would be greatly appreciated. Regards, Drew Garey Aquatic Ecoloy Lab Manager Virginia Commonwealth University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data editor in R- could it be improved?
There are three different data editors, so you will need to start by telling us your OS (and the other details asked for in the posting guide). But this is really an R-devel question, and that is where offers to work on this (or to sponsor such work) should be posted. As the author of one version and of a major revision of another, I have no interest in touching the code again. On Wed, 4 Feb 2009, Simon Pickett wrote: Hi all, I've used R for basic programming and data management for a few years now. One of the things that I think could be improved is the data editor. Its a great feature and I use it alot by calling edit(data.frame); very useful to see if what you tried to do actually worked. However, one of the annoying things about it is that when you scroll down the window it doesnt show you all the data (for a large data frame), just subsets of it. It would also be quite useful if the width of the columns could be adjusted or didnt default to the size of the name of the column. (since the names might often be very big if the data frame was created using a function). Side-ways scrolling is quite "jerky" too. Just wondered if this was on anyone else's wish list? Simon. Dr. Simon Pickett Research Ecologist Land Use Department Terrestrial Unit British Trust for Ornithology The Nunnery Thetford Norfolk IP242PU 01842750050 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data editor in R- could it be improved?
Hi ! I am using "Tinn R" data editor. This is wonderful and also thin one. Try this. I guess, yu will find what you are looking for. Regards, Suresh Simon Pickett-4 wrote: > > Hi all, > > I've used R for basic programming and data management for a few years now. > One of the things that I think could be improved is the data editor. > > Its a great feature and I use it alot by calling edit(data.frame); very > useful to see if what you tried to do actually worked. > > However, one of the annoying things about it is that when you scroll down > the window it doesnt show you all the data (for a large data frame), just > subsets of it. > > It would also be quite useful if the width of the columns could be > adjusted or didnt default to the size of the name of the column. (since > the names might often be very big if the data frame was created using a > function). Side-ways scrolling is quite "jerky" too. > > Just wondered if this was on anyone else's wish list? > > Simon. > > > Dr. Simon Pickett > Research Ecologist > Land Use Department > Terrestrial Unit > British Trust for Ornithology > The Nunnery > Thetford > Norfolk > IP242PU > 01842750050 > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/data-editor-in-R--could-it-be-improved--tp21831077p21834015.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data editor in R- could it be improved?
Hi all, I've used R for basic programming and data management for a few years now. One of the things that I think could be improved is the data editor. Its a great feature and I use it alot by calling edit(data.frame); very useful to see if what you tried to do actually worked. However, one of the annoying things about it is that when you scroll down the window it doesnt show you all the data (for a large data frame), just subsets of it. It would also be quite useful if the width of the columns could be adjusted or didnt default to the size of the name of the column. (since the names might often be very big if the data frame was created using a function). Side-ways scrolling is quite "jerky" too. Just wondered if this was on anyone else's wish list? Simon. Dr. Simon Pickett Research Ecologist Land Use Department Terrestrial Unit British Trust for Ornithology The Nunnery Thetford Norfolk IP242PU 01842750050 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Frame Manipulation: Time Series
Hello Jim: Yes, that's exactly what I needed! Thank you! Josip - Original Message - From: "jim holtman" To: "Josip Dasovic" Cc: r-help@r-project.org Sent: Tuesday, January 27, 2009 4:45:31 PM GMT -08:00 US/Canada Pacific Subject: Re: [R] Data Frame Manipulation: Time Series Is the what you are after: > df<-data.frame(cbind("country"=c(rep("Angola", 9), rep("Burundi", 7), + rep("Chad", 13)), "year"=c(1975:1983, 1989:1995, 1965:1977)), + "war"=c(rep(1,2), rep(0,5), rep(1,2), rep(1,2), rep(0,2), rep(1,3), + rep(1,4), rep(0,6), rep(1,3))) > x <- split(df, df$country) > do.call(rbind, lapply(x, function(.cty){ + # create where the war starts + .start <- diff(c(0, .cty$war)) + .cty[(.start == 1) & (.cty$war == 1),] + })) country year war Angola.1Angola 1975 1 Angola.8Angola 1982 1 Burundi.10 Burundi 1989 1 Burundi.14 Burundi 1993 1 Chad.17 Chad 1965 1 Chad.27 Chad 1975 1 On Tue, Jan 27, 2009 at 5:45 PM, Josip Dasovic wrote: > Dear R Helpers: > > I have a data set where the unit of observation is country-year. I would like > to generate a new data set based on some inclusionary (exclusionary) > criteria. Here is an example of the type of data that I have. > > df<-data.frame(cbind("country"=c(rep("Angola", 9), rep("Burundi", 7), > rep("Chad", 13)), "year"=c(1975:1983, 1989:1995, 1965:1977)), > "war"=c(rep(1,2), rep(0,5), rep(1,2), rep(1,2), rep(0,2), rep(1,3), rep(1,4), > rep(0,6), rep(1,3))) >> df > country year war > 1 Angola 1975 1 > 2 Angola 1976 1 > 3 Angola 1977 0 > 4 Angola 1978 0 > 5 Angola 1979 0 > 6 Angola 1980 0 > 7 Angola 1981 0 > 8 Angola 1982 1 > 9 Angola 1983 1 > 10 Burundi 1989 1 > 11 Burundi 1990 1 > 12 Burundi 1991 0 > 13 Burundi 1992 0 > 14 Burundi 1993 1 > 15 Burundi 1994 1 > 16 Burundi 1995 1 > 17Chad 1965 1 > 18Chad 1966 1 > 19Chad 1967 1 > 20Chad 1968 1 > 21Chad 1969 0 > 22Chad 1970 0 > 23Chad 1971 0 > 24Chad 1972 0 > 25Chad 1973 0 > 26Chad 1974 0 > 27Chad 1975 1 > 28Chad 1976 1 > 29Chad 1977 1 > > What I would like to do is to create a new data frame with only those > observations for which a) the "war" variable value is 1, (this ie easy > enough) and 2) it is the first (in time) instance of war for that country for > that war "episode" (each of the countries above has two war episodes). Thus, > the new data frame should look like this: > > country year war > 1 Angola 1975 1 > 8 Angola 1982 1 > 10 Burundi 1989 1 > 14 Burundi 1993 1 > 17Chad 1965 1 > 27Chad 1975 1 > > Any suggestions as to how this can be done? > > Thanks in advance, > Josip > > R version 2.7.2 Patched (2008-09-20 r47259) > Mac OSX 10.5.5 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Frame Manipulation: Time Series
Is the what you are after: > df<-data.frame(cbind("country"=c(rep("Angola", 9), rep("Burundi", 7), + rep("Chad", 13)), "year"=c(1975:1983, 1989:1995, 1965:1977)), + "war"=c(rep(1,2), rep(0,5), rep(1,2), rep(1,2), rep(0,2), rep(1,3), + rep(1,4), rep(0,6), rep(1,3))) > x <- split(df, df$country) > do.call(rbind, lapply(x, function(.cty){ + # create where the war starts + .start <- diff(c(0, .cty$war)) + .cty[(.start == 1) & (.cty$war == 1),] + })) country year war Angola.1Angola 1975 1 Angola.8Angola 1982 1 Burundi.10 Burundi 1989 1 Burundi.14 Burundi 1993 1 Chad.17 Chad 1965 1 Chad.27 Chad 1975 1 On Tue, Jan 27, 2009 at 5:45 PM, Josip Dasovic wrote: > Dear R Helpers: > > I have a data set where the unit of observation is country-year. I would like > to generate a new data set based on some inclusionary (exclusionary) > criteria. Here is an example of the type of data that I have. > > df<-data.frame(cbind("country"=c(rep("Angola", 9), rep("Burundi", 7), > rep("Chad", 13)), "year"=c(1975:1983, 1989:1995, 1965:1977)), > "war"=c(rep(1,2), rep(0,5), rep(1,2), rep(1,2), rep(0,2), rep(1,3), rep(1,4), > rep(0,6), rep(1,3))) >> df > country year war > 1 Angola 1975 1 > 2 Angola 1976 1 > 3 Angola 1977 0 > 4 Angola 1978 0 > 5 Angola 1979 0 > 6 Angola 1980 0 > 7 Angola 1981 0 > 8 Angola 1982 1 > 9 Angola 1983 1 > 10 Burundi 1989 1 > 11 Burundi 1990 1 > 12 Burundi 1991 0 > 13 Burundi 1992 0 > 14 Burundi 1993 1 > 15 Burundi 1994 1 > 16 Burundi 1995 1 > 17Chad 1965 1 > 18Chad 1966 1 > 19Chad 1967 1 > 20Chad 1968 1 > 21Chad 1969 0 > 22Chad 1970 0 > 23Chad 1971 0 > 24Chad 1972 0 > 25Chad 1973 0 > 26Chad 1974 0 > 27Chad 1975 1 > 28Chad 1976 1 > 29Chad 1977 1 > > What I would like to do is to create a new data frame with only those > observations for which a) the "war" variable value is 1, (this ie easy > enough) and 2) it is the first (in time) instance of war for that country for > that war "episode" (each of the countries above has two war episodes). Thus, > the new data frame should look like this: > > country year war > 1 Angola 1975 1 > 8 Angola 1982 1 > 10 Burundi 1989 1 > 14 Burundi 1993 1 > 17Chad 1965 1 > 27Chad 1975 1 > > Any suggestions as to how this can be done? > > Thanks in advance, > Josip > > R version 2.7.2 Patched (2008-09-20 r47259) > Mac OSX 10.5.5 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data Frame Manipulation: Time Series
Dear R Helpers: I have a data set where the unit of observation is country-year. I would like to generate a new data set based on some inclusionary (exclusionary) criteria. Here is an example of the type of data that I have. df<-data.frame(cbind("country"=c(rep("Angola", 9), rep("Burundi", 7), rep("Chad", 13)), "year"=c(1975:1983, 1989:1995, 1965:1977)), "war"=c(rep(1,2), rep(0,5), rep(1,2), rep(1,2), rep(0,2), rep(1,3), rep(1,4), rep(0,6), rep(1,3))) > df country year war 1 Angola 1975 1 2 Angola 1976 1 3 Angola 1977 0 4 Angola 1978 0 5 Angola 1979 0 6 Angola 1980 0 7 Angola 1981 0 8 Angola 1982 1 9 Angola 1983 1 10 Burundi 1989 1 11 Burundi 1990 1 12 Burundi 1991 0 13 Burundi 1992 0 14 Burundi 1993 1 15 Burundi 1994 1 16 Burundi 1995 1 17Chad 1965 1 18Chad 1966 1 19Chad 1967 1 20Chad 1968 1 21Chad 1969 0 22Chad 1970 0 23Chad 1971 0 24Chad 1972 0 25Chad 1973 0 26Chad 1974 0 27Chad 1975 1 28Chad 1976 1 29Chad 1977 1 What I would like to do is to create a new data frame with only those observations for which a) the "war" variable value is 1, (this ie easy enough) and 2) it is the first (in time) instance of war for that country for that war "episode" (each of the countries above has two war episodes). Thus, the new data frame should look like this: country year war 1 Angola 1975 1 8 Angola 1982 1 10 Burundi 1989 1 14 Burundi 1993 1 17Chad 1965 1 27Chad 1975 1 Any suggestions as to how this can be done? Thanks in advance, Josip R version 2.7.2 Patched (2008-09-20 r47259) Mac OSX 10.5.5 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data management
Although the message is rather unreadable, I guess you want to look at ?reshape. Uwe Ligges oscar linares wrote: Dear Rxperts, I would like to convert the following: StudyStudy.NameParameterDestSrcFormValueMin MaxFSD 1NT_1-0BFK(03)03A0.128510.0E+001. 0.41670E-01 1NT_1-0BFL(00,03)0003D0.36577 1NT_1-0BFL(00,02)0002D0.9 1NT_1-0BFL(00,04)0004D0.9 1NT_1-0BFP(01)01A0.365770.0E+00100.00 0.36880E-01 1NT_1-0BFP(02)02A28.2690.0E+00100.00 0.58489E-01 1NT_1-0BFP(03)03A68.14410.0001000.0 0.27806E-01 1NT_1-0BFP(05)05D0.9 1NT_1-0BFP(31)31D26.316 1NT_1-0BFP(32)32D29.483 1NT_1-0BFP(22)22D7.7813 StudyStudy.NameParameterDestSrcFormValueMin MaxFSD 1NT_1-1BFK(03)03A0.128520.0E+001. 0.39727E-01 1NT_1-1BFL(00,03)0003D0.36577 1NT_1-1BFL(00,02)0002D0.9 1NT_1-1BFL(00,04)0004D0.9 1NT_1-1BFP(01)01A0.365770.0E+00100.00 0.35166E-01 1NT_1-1BFP(02)02A28.2800.0E+00100.00 0.55760E-01 1NT_1-1BFP(03)03A68.13410.0001000.0 0.26508E-01 1NT_1-1BFP(05)05D0.9 1NT_1-1BFP(22)22D7.7811 StudyStudy.NameParameterDestSrcFormValueMin MaxFSD 1NT_1-2BFK(03)03A0.128510.0E+001. 0.90167E-01 1NT_1-2BFL(00,03)0003D0.36575 1NT_1-2BFL(00,02)0002D0.9 1NT_1-2BFL(00,04)0004D0.9 1NT_1-2BFP(01)01A0.365750.0E+00100.00 0.79794E-01 1NT_1-2BFP(02)02A23.8900.0E+00100.00 0.13385 1NT_1-2BFP(03)03A76.29710.0001000.0 0.68931E-01 1NT_1-2BFP(05)05D0.9 1NT_1-2BFP(22)22D7.7815 To look like the following stata output | study studyn~e K3P1 P2 P3 P5P11 P23 P31 P32 P33 | |--| 1. | 1 NT_16 .125 .35 35.903 8.6815 .83195 58 .13793 26.316 4.7181 13.211 | 2. | 2 NT_1 .125 .35 23.173 9.4882 .75125 66.7 .11994 26.316 4.042711.32 | 3. | 3 NT_2 .125 .35 48.229 7.1296 .68354 66.7 .11994 26.316 4.9101 13.748 | 4. | 4 NT_3 .125 .35 8.0027 15.967 1.1438 80.1 26.316 .37137 1.0398 | 5. | 5 NT_4 .125 .35 24.468 4.4256 .65408 40.2 26.316 2.1901 6.1322 | |--| Any suggestions for doing this in R? Many thanks in advance for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data management
Dear Rxperts, I would like to convert the following: StudyStudy.NameParameterDestSrcFormValueMin MaxFSD 1NT_1-0BFK(03)03A0.128510.0E+001. 0.41670E-01 1NT_1-0BFL(00,03)0003D0.36577 1NT_1-0BFL(00,02)0002D0.9 1NT_1-0BFL(00,04)0004D0.9 1NT_1-0BFP(01)01A0.365770.0E+00100.00 0.36880E-01 1NT_1-0BFP(02)02A28.2690.0E+00100.00 0.58489E-01 1NT_1-0BFP(03)03A68.14410.0001000.0 0.27806E-01 1NT_1-0BFP(05)05D0.9 1NT_1-0BFP(31)31D26.316 1NT_1-0BFP(32)32D29.483 1NT_1-0BFP(22)22D7.7813 StudyStudy.NameParameterDestSrcFormValueMin MaxFSD 1NT_1-1BFK(03)03A0.128520.0E+001. 0.39727E-01 1NT_1-1BFL(00,03)0003D0.36577 1NT_1-1BFL(00,02)0002D0.9 1NT_1-1BFL(00,04)0004D0.9 1NT_1-1BFP(01)01A0.365770.0E+00100.00 0.35166E-01 1NT_1-1BFP(02)02A28.2800.0E+00100.00 0.55760E-01 1NT_1-1BFP(03)03A68.13410.0001000.0 0.26508E-01 1NT_1-1BFP(05)05D0.9 1NT_1-1BFP(22)22D7.7811 StudyStudy.NameParameterDestSrcFormValueMin MaxFSD 1NT_1-2BFK(03)03A0.128510.0E+001. 0.90167E-01 1NT_1-2BFL(00,03)0003D0.36575 1NT_1-2BFL(00,02)0002D0.9 1NT_1-2BFL(00,04)0004D0.9 1NT_1-2BFP(01)01A0.365750.0E+00100.00 0.79794E-01 1NT_1-2BFP(02)02A23.8900.0E+00100.00 0.13385 1NT_1-2BFP(03)03A76.29710.0001000.0 0.68931E-01 1NT_1-2BFP(05)05D0.9 1NT_1-2BFP(22)22D7.7815 To look like the following stata output | study studyn~e K3P1 P2 P3 P5P11 P23 P31 P32 P33 | |--| 1. | 1 NT_16 .125 .35 35.903 8.6815 .83195 58 .13793 26.316 4.7181 13.211 | 2. | 2 NT_1 .125 .35 23.173 9.4882 .75125 66.7 .11994 26.316 4.042711.32 | 3. | 3 NT_2 .125 .35 48.229 7.1296 .68354 66.7 .11994 26.316 4.9101 13.748 | 4. | 4 NT_3 .125 .35 8.0027 15.967 1.1438 80.1 26.316 .37137 1.0398 | 5. | 5 NT_4 .125 .35 24.468 4.4256 .65408 40.2 26.316 2.1901 6.1322 | |--| Any suggestions for doing this in R? Many thanks in advance for your help. -- Oscar Oscar A. Linares Molecular Medicine Unit Bolles Harbor Monroe, Michigan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data management
? gsub > > gsub("\\(|\\)", "", var) You can then read.table on a textConnection. > read.table(textConnection(gsub("\\(|\\)", "", var) )) V1 V2 1 p1 10 2 p1 3 3 p1 4 4 p2 20 5 p2 30 6 p2 40 7 p3 4 8 p3 1 9 p1 2 On Jan 18, 2009, at 12:13 PM, oscar linares wrote: Dear Rxperts, I have a varaibles data file that looks like this p(1) 10 p(1) 3 p(1) 4 p(2) 20 p(2) 30 p(2) 40 p(3) 4 p(3) 1 p(1) 2 I cannot process these data with R because it does not like the parentheses. How can I get these to look like: p1 10 p1 3 p1 4 p2 20 p2 30 p2 40 p3 4 p3 1 p3 2 The data is in a tab delimited text file and I want to get it into a data.frame(). Many thanks in advance. OAL p1 p1 -- Oscar Oscar A. Linares Molecular Medicine Unit Bolles Harbor Monroe, Michigan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data management
Does this give you what you want: > x <- read.table(textConnection("p(1) 10 + p(1) 3 + p(1) 4 + p(2) 20 + p(2) 30 + p(2) 40 + p(3) 4 + p(3) 1 + p(1) 2"), as.is=TRUE) > # remove parenthesis > x$V1 <- gsub("[()]", "", x$V1) > > > x V1 V2 1 p1 10 2 p1 3 3 p1 4 4 p2 20 5 p2 30 6 p2 40 7 p3 4 8 p3 1 9 p1 2 > On Sun, Jan 18, 2009 at 12:13 PM, oscar linares wrote: > Dear Rxperts, > > I have a varaibles data file that looks like this > > p(1) 10 > p(1) 3 > p(1) 4 > p(2) 20 > p(2) 30 > p(2) 40 > p(3) 4 > p(3) 1 > p(1) 2 > > I cannot process these data with R because it does not like the parentheses. > How can I get these to look like: > > p1 10 > p1 3 > p1 4 > p2 20 > p2 30 > p2 40 > p3 4 > p3 1 > p3 2 > > The data is in a tab delimited text file and I want to get it into a > data.frame(). > > Many thanks in advance. > > OAL > p1 > p1 > > -- > Oscar > Oscar A. Linares > Molecular Medicine Unit > Bolles Harbor > Monroe, Michigan > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data management
Dear Rxperts, I have a varaibles data file that looks like this p(1) 10 p(1) 3 p(1) 4 p(2) 20 p(2) 30 p(2) 40 p(3) 4 p(3) 1 p(1) 2 I cannot process these data with R because it does not like the parentheses. How can I get these to look like: p1 10 p1 3 p1 4 p2 20 p2 30 p2 40 p3 4 p3 1 p3 2 The data is in a tab delimited text file and I want to get it into a data.frame(). Many thanks in advance. OAL p1 p1 -- Oscar Oscar A. Linares Molecular Medicine Unit Bolles Harbor Monroe, Michigan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frames with å, ä, and ö (=n on-ASCII-characters) from windows to mac os x
Hi, On my system (see below), it works fine (inputing the code below at the R prompt). Make sure that the encoding of the input file is encoded UTF-8. Rgds, Ivan > sessionInfo() R version 2.8.1 Patched (2009-01-14 r47602) i386-apple-darwin9.6.0 locale: en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base > structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L,7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,14L, 12L), .Label = c("AB", "AC", "BD", "C", "D", "E", "F", "G","H", "I", "K", "M", "N", "O", "S", "T", "U", "W", "X", "Y", "Z"), class = "factor"), Län = structure(c(1L, 4L, 3L, 5L, 6L, 7L,8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L,19L, 11L), .Label = c("Blekinge län", "Dalarnas län", "Gotlands län","Gävleborgs län","Hallands län", "Jämtlands län", "Jönköpings län","Kalmar län", "Kronobergs län", "Norrbottens län", "Skåne län","Stockholms län", "Södermanlands län", "Uppsala län", "Värmlands län","Västerbottens län", "Västernorrlands län", "Västmanlands län","Västra Götalands län", "Örebro län", "Östergötlands län"), class ="factor")), .Names = c("LANKOD","Län"), class = "data.frame", row.names = c("0", "1", "2", "3","4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15","16", "17", "18", "19", "20")) LANKOD Län 0 K Blekinge län 1 X Gävleborgs län 2 I Gotlands län 3 N Hallands län 4 ZJämtlands län 5 F Jönköpings län 6 H Kalmar län 7 W Dalarnas län 8 G Kronobergs län 9 BD Norrbottens län 10 T Örebro län 11 EÖstergötlands län 12 DSödermanlands län 13 C Uppsala län 14 SVärmlands län 15 ACVästerbottens län 16 Y Västernorrlands län 17 U Västmanlands län 18 AB Stockholms län 19 O Västra Götalands län 20 MSkåne län > Länkarta <- structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L,7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,14L, 12L), .Label = c("AB", "AC", "BD", "C", "D", "E", "F", "G","H", "I", "K", "M", "N", "O", "S", "T", "U", "W", "X", "Y", "Z"), class = "factor"), Län = structure(c(1L, 4L, 3L, 5L, 6L, 7L,8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L,19L, 11L), .Label = c("Blekinge län", "Dalarnas län", "Gotlands län","Gävleborgs län","Hallands län", "Jämtlands län", "Jönköpings län","Kalmar län", "Kronobergs län", "Norrbottens län", "Skåne län","Stockholms län", "Södermanlands län", "Uppsala län", "Värmlands län","Västerbottens län", "Västernorrlands län", "Västmanlands län","Västra Götalands län", "Örebro län", "Östergötlands län"), class ="factor")), .Names = c("LANKOD","Län"), class = "data.frame", row.names = c("0", "1", "2", "3","4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15","16", "17", "18", "19", "20")) > ls() [1] "Länkarta" > On 16 Jan 2009, at 14:13, Gustaf Rydevik wrote: Hi, I ran into this issue previously and managed to solve it, but I've forgotten how and am getting frustrated... I have a data frame (see below) with scandinavian characters in R (2.7.1) running on a Win Xp-computer. I save the data frame in an RData-file on a usb stick, and load() it in R (2.8.0) running on OS X 10.5. Now the name of the data frame and all factor labels with scandinavian characters are scrambled. How do I make R in OS X read my data frame? From what I've managed to find in the list archives and the FAQ I either 1) run Sys.setlocale("LC_ALL","en_US.UTF-8") ### Doesn't change anything or 2) run defaults write org.R-project.R force.LANG en_US.UTF-8 in the terminal, which doesn't help either. I must admit that I couldn't quite follow what documentation i found on locales, so I might have messed up somewhere along the line. Many thanks in advance for your help! Regards, Gustaf Länkarta <- structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L, 7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L, 14L, 12L), .Label = c("AB", "AC", "BD", "C", "D", "E", "F", "G", "H", "I", "K", "M", "N", "O", "S", "T", "U", "W", "X", "Y", "Z" ), class = "factor"), Län = structure(c(1L, 4L, 3L, 5L, 6L, 7L, 8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L, 19L, 11L), .Label = c("Blekinge län", "Dalarnas län", "Gotlands län", "Gävleborgs län", "Hallands län", "Jämtlands län", "Jönköpings län", "Kalmar län", "Kronobergs län", "Norrbottens län", "Skåne län", "Stockholms län", "Södermanlands län", "Uppsala län", "Värmlands län", "Västerbottens län", "Västernorrlands län", "Västmanlands län", "Västra Götalands län", "Örebro län", "Östergötlands län"), class = "factor")), .Names = c("LANKOD", "Län"), class = "data.frame", row.names = c("0", "1", "2", "3", "4", "5", "6", "7",
[R] data frames with å, ä, and ö (=n on-ASCII-characters) from windows to mac os x
Hi, I ran into this issue previously and managed to solve it, but I've forgotten how and am getting frustrated... I have a data frame (see below) with scandinavian characters in R (2.7.1) running on a Win Xp-computer. I save the data frame in an RData-file on a usb stick, and load() it in R (2.8.0) running on OS X 10.5. Now the name of the data frame and all factor labels with scandinavian characters are scrambled. How do I make R in OS X read my data frame? >From what I've managed to find in the list archives and the FAQ I either 1) run Sys.setlocale("LC_ALL","en_US.UTF-8") ### Doesn't change anything or 2) run defaults write org.R-project.R force.LANG en_US.UTF-8 in the terminal, which doesn't help either. I must admit that I couldn't quite follow what documentation i found on locales, so I might have messed up somewhere along the line. Many thanks in advance for your help! Regards, Gustaf Länkarta <- structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L, 7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L, 14L, 12L), .Label = c("AB", "AC", "BD", "C", "D", "E", "F", "G", "H", "I", "K", "M", "N", "O", "S", "T", "U", "W", "X", "Y", "Z" ), class = "factor"), Län = structure(c(1L, 4L, 3L, 5L, 6L, 7L, 8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L, 19L, 11L), .Label = c("Blekinge län", "Dalarnas län", "Gotlands län", "Gävleborgs län", "Hallands län", "Jämtlands län", "Jönköpings län", "Kalmar län", "Kronobergs län", "Norrbottens län", "Skåne län", "Stockholms län", "Södermanlands län", "Uppsala län", "Värmlands län", "Västerbottens län", "Västernorrlands län", "Västmanlands län", "Västra Götalands län", "Örebro län", "Östergötlands län"), class = "factor")), .Names = c("LANKOD", "Län"), class = "data.frame", row.names = c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20")) -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data format issue
Use read.zoo and aggregate.zoo from zoo and months, hours and as.chron from chron. Note that we must read in col 1 as character to ensure leading zeros don't get dropped. There are two mph columns and it is assumed you want both: Lines <- "LST inch mphDeg DegF DegF%volts Deg mph w/m2 050601 0.00 13.6 218.1 36.8 -999 65.1 -999 -999 18.20.2 0506010005 0.00 12.9 214.3 36.8 -999 65.5 -999 -999 16.90.2 0506010010 0.00 14.4 215.7 36.9 -999 65.4 -999 -999 20.40.2 0506010015 0.00 13.8 215.8 36.8 -999 65.7 -999 -999 19.70.3 0506010020 0.00 11.9 213.4 36.8 -999 65.6 -999 -999 14.60.2 0506010025 0.00 12.7 212.4 36.8 -999 65.4 -999 -999 16.90.2 0506010030 0.00 14.1 215.8 36.8 -999 65.9 -999 -999 19.10.2 0506010035 0.00 14.8 217.2 36.7 -999 66.2 -999 -999 20.40.2 0506010040 0.00 16.2 222.0 36.8 -999 66.6 -999 -999 20.20.2 0506010045 0.00 13.6 219.5 36.7 -999 66.6 -999 -999 18.40.2 0506010050 0.00 14.8 217.6 36.7 -999 66.2 -999 -999 20.00.2 0506010055 0.00 13.1 214.8 36.7 -999 65.9 -999 -999 20.20.2 0506010100 0.00 12.2 214.3 36.7 -999 65.2 -999 -999 15.60.2 0506010105 0.00 14.2 207.8 36.7 -999 65.0 -999 -999 19.90.2 0506010110 0.00 15.4 207.0 36.7 -999 64.4 -999 -999 20.20.2 0506010115 0.00 17.2 205.9 36.7 -999 64.5 -999 -999 22.10.2 0506010120 0.00 16.8 208.9 36.8 -999 65.0 -999 -999 21.90.2 0506010125 0.00 18.4 214.0 36.9 -999 65.1 -999 -999 26.40.2 0506010130 0.00 17.3 214.7 37.0 -999 65.5 -999 -999 24.00.2 0506010135 0.00 18.4 214.3 37.1 -999 65.2 -999 -999 24.90.2 0506010140 0.00 19.6 216.6 37.3 -999 65.3 -999 -999 26.70.2 0506010145 0.00 19.7 220.5 37.5 -999 65.1 -999 -999 27.50.2 0506010150 0.00 19.6 215.5 37.6 -999 64.6 -999 -999 26.40.2 0506010155 0.00 21.8 220.1 37.8 -999 64.1 -999 -999 31.20.2 0506010200 0.00 23.4 222.9 37.9 -999 63.8 -999 -999 31.80.2 0506010205 0.00 24.0 221.7 37.9 -999 63.7 -999 -999 30.30.2 0506010210 0.00 24.2 223.4 38.0 -999 63.5 -999 -999 28.20.2 0506010215 0.00 23.8 224.9 38.0 -999 63.4 -999 -999 30.30.2 0506010220 0.00 23.9 225.1 38.1 -999 63.5 -999 -999 29.50.2 0506010225 0.00 23.9 227.4 38.1 -999 63.5 -999 -999 30.30.2 0506010230 0.00 23.9 226.0 38.0 -999 63.6 -999 -999 27.50.2 0506010235 0.00 21.5 221.4 38.0 -999 63.7 -999 -999 28.40.2 0506010240 0.00 22.3 222.6 37.9 -999 63.8 -999 -999 27.90.2 0506010245 0.00 21.5 223.9 37.9 -999 64.0 -999 -999 28.40.2 0506010250 0.00 22.2 226.7 37.8 -999 64.2 -999 -999 27.70.2 0506010255 0.00 21.9 223.5 37.8 -999 64.3 -999 -999 26.90.2 0506010300 0.00 22.0 223.2 37.7 -999 64.3 -999 -999 28.00.2" library(zoo) library(chron) z <- read.zoo(textConnection(Lines), header = TRUE, na.strings = -999, format = "%y%m%d%H%M", FUN = as.chron, colClasses = c("character", rep("numeric", 10))) mph <- z[months(time(z)) %in% c("Jun", "Jul", "Aug"), grep("mph", colnames(z))] aggregate(mph, hours, mean) On Sat, Dec 20, 2008 at 9:09 PM, Sherri Heck wrote: > Dear all- > > I have a dataset (see a sample below - but the whole dataset is June 2005 - > June 2008). The "LST" format is "YYMMDDHHmm" and I would like to get the > hourly average of the "mph" for the summer months (spanning all years). I > have been trying to use "aggregate" but am not having much success at all! > any thoughts would be greatly appreciated. > > thanks- > > sherri > > LST inch mphDeg DegF DegF%volts Degmph w/m2 > 050601 0.00 13.6 218.1 36.8 -999 65.1 -999 -999 18.2 > 0.2 > 0506010005 0.00 12.9 214.3 36.8 -999 65.5 -999 -999 16.9 > 0.2 > 0506010010 0.00 14.4 215.7 36.9 -999 65.4 -999 -999 20.4 > 0.2 > 0506010015 0.00 13.8 215.8 36.8 -999 65.7 -999 -999 19.7 > 0.3 > 0506010020 0.00 11.9 213.4 36.8 -999 65.6 -999 -999 14.6 > 0.2 > 0506010025 0.00 12.7 212.4 36.8 -999 65.4 -999 -999 16.9 > 0.2 > 0506010030 0.00 14.1 215.8 36.8 -999 65.9 -999 -999 19.1 > 0.2 > 0506010035 0.00 14.8 217.2 36.7 -999 66.2 -999 -999 20.4 > 0.2 > 0506010040 0.00 16.2 222.0 36.8 -999 66.6 -999 -999 20.2 > 0.2 > 0506010045 0.00 13.6 219.5 36.7 -999 66.6 -999 -999 18.4 > 0.2 > 0506010050 0.00
Re: [R] data format issue
Does this do it for you: > # quick and dirty -- remove the 'mm' from the data and then aggregate > x$hours <- (x$LST %/% 100) * 100 > aggregate(x$mph, list(x$hours), mean) Group.1x 1 50601 13.82500 2 506010100 17.55000 3 506010200 23.04167 4 506010300 22.0 > > You can also 'filter' out the months for only the summer On Sat, Dec 20, 2008 at 9:09 PM, Sherri Heck wrote: > Dear all- > > I have a dataset (see a sample below - but the whole dataset is June 2005 - > June 2008). The "LST" format is "YYMMDDHHmm" and I would like to get the > hourly average of the "mph" for the summer months (spanning all years). I > have been trying to use "aggregate" but am not having much success at all! > any thoughts would be greatly appreciated. > > thanks- > > sherri > > LST inch mphDeg DegF DegF%volts Degmph w/m2 > 050601 0.00 13.6 218.1 36.8 -999 65.1 -999 -999 18.2 > 0.2 > 0506010005 0.00 12.9 214.3 36.8 -999 65.5 -999 -999 16.9 > 0.2 > 0506010010 0.00 14.4 215.7 36.9 -999 65.4 -999 -999 20.4 > 0.2 > 0506010015 0.00 13.8 215.8 36.8 -999 65.7 -999 -999 19.7 > 0.3 > 0506010020 0.00 11.9 213.4 36.8 -999 65.6 -999 -999 14.6 > 0.2 > 0506010025 0.00 12.7 212.4 36.8 -999 65.4 -999 -999 16.9 > 0.2 > 0506010030 0.00 14.1 215.8 36.8 -999 65.9 -999 -999 19.1 > 0.2 > 0506010035 0.00 14.8 217.2 36.7 -999 66.2 -999 -999 20.4 > 0.2 > 0506010040 0.00 16.2 222.0 36.8 -999 66.6 -999 -999 20.2 > 0.2 > 0506010045 0.00 13.6 219.5 36.7 -999 66.6 -999 -999 18.4 > 0.2 > 0506010050 0.00 14.8 217.6 36.7 -999 66.2 -999 -999 20.0 > 0.2 > 0506010055 0.00 13.1 214.8 36.7 -999 65.9 -999 -999 20.2 > 0.2 > 0506010100 0.00 12.2 214.3 36.7 -999 65.2 -999 -999 15.6 > 0.2 > 0506010105 0.00 14.2 207.8 36.7 -999 65.0 -999 -999 19.9 > 0.2 > 0506010110 0.00 15.4 207.0 36.7 -999 64.4 -999 -999 20.2 > 0.2 > 0506010115 0.00 17.2 205.9 36.7 -999 64.5 -999 -999 22.1 > 0.2 > 0506010120 0.00 16.8 208.9 36.8 -999 65.0 -999 -999 21.9 > 0.2 > 0506010125 0.00 18.4 214.0 36.9 -999 65.1 -999 -999 26.4 > 0.2 > 0506010130 0.00 17.3 214.7 37.0 -999 65.5 -999 -999 24.0 > 0.2 > 0506010135 0.00 18.4 214.3 37.1 -999 65.2 -999 -999 24.9 > 0.2 > 0506010140 0.00 19.6 216.6 37.3 -999 65.3 -999 -999 26.7 > 0.2 > 0506010145 0.00 19.7 220.5 37.5 -999 65.1 -999 -999 27.5 > 0.2 > 0506010150 0.00 19.6 215.5 37.6 -999 64.6 -999 -999 26.4 > 0.2 > 0506010155 0.00 21.8 220.1 37.8 -999 64.1 -999 -999 31.2 > 0.2 > 0506010200 0.00 23.4 222.9 37.9 -999 63.8 -999 -999 31.8 > 0.2 > 0506010205 0.00 24.0 221.7 37.9 -999 63.7 -999 -999 30.3 > 0.2 > 0506010210 0.00 24.2 223.4 38.0 -999 63.5 -999 -999 28.2 > 0.2 > 0506010215 0.00 23.8 224.9 38.0 -999 63.4 -999 -999 30.3 > 0.2 > 0506010220 0.00 23.9 225.1 38.1 -999 63.5 -999 -999 29.5 > 0.2 > 0506010225 0.00 23.9 227.4 38.1 -999 63.5 -999 -999 30.3 > 0.2 > 0506010230 0.00 23.9 226.0 38.0 -999 63.6 -999 -999 27.5 > 0.2 > 0506010235 0.00 21.5 221.4 38.0 -999 63.7 -999 -999 28.4 > 0.2 > 0506010240 0.00 22.3 222.6 37.9 -999 63.8 -999 -999 27.9 > 0.2 > 0506010245 0.00 21.5 223.9 37.9 -999 64.0 -999 -999 28.4 > 0.2 > 0506010250 0.00 22.2 226.7 37.8 -999 64.2 -999 -999 27.7 > 0.2 > 0506010255 0.00 21.9 223.5 37.8 -999 64.3 -999 -999 26.9 > 0.2 > 0506010300 0.00 22.0 223.2 37.7 -999 64.3 -999 -999 28.0 > 0.2 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data format issue
Dear all- I have a dataset (see a sample below - but the whole dataset is June 2005 - June 2008). The "LST" format is "YYMMDDHHmm" and I would like to get the hourly average of the "mph" for the summer months (spanning all years). I have been trying to use "aggregate" but am not having much success at all! any thoughts would be greatly appreciated. thanks- sherri LST inch mphDeg DegF DegF%volts Degmph w/m2 050601 0.00 13.6 218.1 36.8 -999 65.1 -999 -999 18.20.2 0506010005 0.00 12.9 214.3 36.8 -999 65.5 -999 -999 16.90.2 0506010010 0.00 14.4 215.7 36.9 -999 65.4 -999 -999 20.40.2 0506010015 0.00 13.8 215.8 36.8 -999 65.7 -999 -999 19.70.3 0506010020 0.00 11.9 213.4 36.8 -999 65.6 -999 -999 14.60.2 0506010025 0.00 12.7 212.4 36.8 -999 65.4 -999 -999 16.90.2 0506010030 0.00 14.1 215.8 36.8 -999 65.9 -999 -999 19.10.2 0506010035 0.00 14.8 217.2 36.7 -999 66.2 -999 -999 20.40.2 0506010040 0.00 16.2 222.0 36.8 -999 66.6 -999 -999 20.20.2 0506010045 0.00 13.6 219.5 36.7 -999 66.6 -999 -999 18.40.2 0506010050 0.00 14.8 217.6 36.7 -999 66.2 -999 -999 20.00.2 0506010055 0.00 13.1 214.8 36.7 -999 65.9 -999 -999 20.20.2 0506010100 0.00 12.2 214.3 36.7 -999 65.2 -999 -999 15.60.2 0506010105 0.00 14.2 207.8 36.7 -999 65.0 -999 -999 19.90.2 0506010110 0.00 15.4 207.0 36.7 -999 64.4 -999 -999 20.20.2 0506010115 0.00 17.2 205.9 36.7 -999 64.5 -999 -999 22.10.2 0506010120 0.00 16.8 208.9 36.8 -999 65.0 -999 -999 21.90.2 0506010125 0.00 18.4 214.0 36.9 -999 65.1 -999 -999 26.40.2 0506010130 0.00 17.3 214.7 37.0 -999 65.5 -999 -999 24.00.2 0506010135 0.00 18.4 214.3 37.1 -999 65.2 -999 -999 24.90.2 0506010140 0.00 19.6 216.6 37.3 -999 65.3 -999 -999 26.70.2 0506010145 0.00 19.7 220.5 37.5 -999 65.1 -999 -999 27.50.2 0506010150 0.00 19.6 215.5 37.6 -999 64.6 -999 -999 26.40.2 0506010155 0.00 21.8 220.1 37.8 -999 64.1 -999 -999 31.20.2 0506010200 0.00 23.4 222.9 37.9 -999 63.8 -999 -999 31.80.2 0506010205 0.00 24.0 221.7 37.9 -999 63.7 -999 -999 30.30.2 0506010210 0.00 24.2 223.4 38.0 -999 63.5 -999 -999 28.20.2 0506010215 0.00 23.8 224.9 38.0 -999 63.4 -999 -999 30.30.2 0506010220 0.00 23.9 225.1 38.1 -999 63.5 -999 -999 29.50.2 0506010225 0.00 23.9 227.4 38.1 -999 63.5 -999 -999 30.30.2 0506010230 0.00 23.9 226.0 38.0 -999 63.6 -999 -999 27.50.2 0506010235 0.00 21.5 221.4 38.0 -999 63.7 -999 -999 28.40.2 0506010240 0.00 22.3 222.6 37.9 -999 63.8 -999 -999 27.90.2 0506010245 0.00 21.5 223.9 37.9 -999 64.0 -999 -999 28.40.2 0506010250 0.00 22.2 226.7 37.8 -999 64.2 -999 -999 27.70.2 0506010255 0.00 21.9 223.5 37.8 -999 64.3 -999 -999 26.90.2 0506010300 0.00 22.0 223.2 37.7 -999 64.3 -999 -999 28.00.2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Analysis Functions in R
On Mon, Dec 08, 2008 at 09:34:35PM -0800, Feanor22 wrote: > > Hi experts of R, > > Are there any functions in R to test a univariate series for long memory > effects, structural breaks and time reversability? > I've found for ARCH effects(ArchTest), for normal (Shapiro.test, > KS.test(comparing with randn) and lillie.test) but not for the above > mentioned. > Where can I find a comprehensive list of functions available by type? Please try the CRAN Task views for EmpiricalFinance, Econometrics and TimeSeries. Dirk -- Three out of two people have difficulties with fractions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data Analysis Functions in R
Hi experts of R, Are there any functions in R to test a univariate series for long memory effects, structural breaks and time reversability? I've found for ARCH effects(ArchTest), for normal (Shapiro.test, KS.test(comparing with randn) and lillie.test) but not for the above mentioned. Where can I find a comprehensive list of functions available by type? Thank you Renato Costa -- View this message in context: http://www.nabble.com/Data-Analysis-Functions-in-R-tp20909079p20909079.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data frame help
hi there I have a dataframe abc 123 234 abc 234 456 def 567 234 elm 123 456 klm 234 678 klm 465 678 I want the unique of first colum along with the values in colum 2 and 3.I By default it will select the first element for the unique so my out put should be abc 123 234 def 567 234 elm 123 456 klm 234 678 I tried something like cbind(unique(DF1[,1],DF1[unique(DF1[,1],c(2,3)] I didnt work kindly give me some suggestions Regards Ramya -- View this message in context: http://www.nabble.com/Data-frame-help-tp20668919p20668919.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data which has different size of elements in each row
This was the scenario that Phoebe posted about. Her data was: 2.93290e-06 1.17772e-06 -0.645205 rs2282755 3.07521e-06 3.14000e-04 0.412997 rs1336838 4.84017e-06 2.18311e-01 0.188669 rs2660664 rs967785 9.77861e-06 7.04740e-02 0.294653 rs2660664 1.22767e-05 1.56325e-05 0.569826 rs6870519 2.27205e-05 1.89000e-04 -0.472862 rs10488345 where the first row has one less column than the third row. Be sure that you are specifying the function arguments correctly for your data. Importantly, note the following from the Details section of ?read.table: The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary. Marc on 11/12/2008 09:49 AM yanniliu wrote: > I have the same problem. If the first row has more columns than later rows, > the fill=TRUE works; however when the first row has LESS columns than later > rows, R won't read in more columns than the length of the first row. Anyone > has solutions? > > Yanni > > Marc Schwartz wrote: >> on 11/11/2008 03:39 PM phoebe kong wrote: >>> Hi all, >>> >>> I have problem reading in a text file as follow. >>> >>> The following data has not column header. I tried the following >>> command but failed, >>> >>> temp<-read.table("data.txt",header=F) >>> >>> An error message stated that line 3 did not have 4 elements. >>> >>> 0.293290E-05 0.117772E-05 -0.645205 rs2282755 >>> 0.307521E-05 0.000314 0.412997 rs1336838 >>> 0.484017E-05 0.218311 0.188669 rs2660664 >>> rs967785 >>> 0.977861E-05 0.070474 0.294653 rs2660664 >>> 0.122767E-04 0.156325E-04 0.569826 rs6870519 >>> 0.227205E-04 0.000189 -0.472862 rs10488345 >>> >>> Does anyone know how to solve it? >>> >>> Thanks in advance for your help. >>> >>> Sit >>> >> See the 'fill' argument in ?read.table >> >>> read.table("clipboard", header = FALSE, fill = TRUE) >>V1 V2V3 V4 V5 >> 1 2.93290e-06 1.17772e-06 -0.645205 rs2282755 >> 2 3.07521e-06 3.14000e-04 0.412997 rs1336838 >> 3 4.84017e-06 2.18311e-01 0.188669 rs2660664 rs967785 >> 4 9.77861e-06 7.04740e-02 0.294653 rs2660664 >> 5 1.22767e-05 1.56325e-05 0.569826 rs6870519 >> 6 2.27205e-05 1.89000e-04 -0.472862 rs10488345 >> >> >> HTH, >> >> Marc Schwartz >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data which has different size of elements in each row
I have the same problem. If the first row has more columns than later rows, the fill=TRUE works; however when the first row has LESS columns than later rows, R won't read in more columns than the length of the first row. Anyone has solutions? Yanni Marc Schwartz wrote: > > on 11/11/2008 03:39 PM phoebe kong wrote: >> Hi all, >> >> I have problem reading in a text file as follow. >> >> The following data has not column header. I tried the following >> command but failed, >> >> temp<-read.table("data.txt",header=F) >> >> An error message stated that line 3 did not have 4 elements. >> >> 0.293290E-05 0.117772E-05 -0.645205 rs2282755 >> 0.307521E-05 0.000314 0.412997 rs1336838 >> 0.484017E-05 0.218311 0.188669 rs2660664 >> rs967785 >> 0.977861E-05 0.070474 0.294653 rs2660664 >> 0.122767E-04 0.156325E-04 0.569826 rs6870519 >> 0.227205E-04 0.000189 -0.472862 rs10488345 >> >> Does anyone know how to solve it? >> >> Thanks in advance for your help. >> >> Sit >> > > See the 'fill' argument in ?read.table > >> read.table("clipboard", header = FALSE, fill = TRUE) >V1 V2V3 V4 V5 > 1 2.93290e-06 1.17772e-06 -0.645205 rs2282755 > 2 3.07521e-06 3.14000e-04 0.412997 rs1336838 > 3 4.84017e-06 2.18311e-01 0.188669 rs2660664 rs967785 > 4 9.77861e-06 7.04740e-02 0.294653 rs2660664 > 5 1.22767e-05 1.56325e-05 0.569826 rs6870519 > 6 2.27205e-05 1.89000e-04 -0.472862 rs10488345 > > > HTH, > > Marc Schwartz > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/data-which-has-different-size-of-elements-in-each-row-tp20449102p20463006.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data which has different size of elements in each row
on 11/11/2008 03:39 PM phoebe kong wrote: > Hi all, > > I have problem reading in a text file as follow. > > The following data has not column header. I tried the following > command but failed, > > temp<-read.table("data.txt",header=F) > > An error message stated that line 3 did not have 4 elements. > > 0.293290E-05 0.117772E-05 -0.645205 rs2282755 > 0.307521E-05 0.000314 0.412997 rs1336838 > 0.484017E-05 0.218311 0.188669 rs2660664 rs967785 > 0.977861E-05 0.070474 0.294653 rs2660664 > 0.122767E-04 0.156325E-04 0.569826 rs6870519 > 0.227205E-04 0.000189 -0.472862 rs10488345 > > Does anyone know how to solve it? > > Thanks in advance for your help. > > Sit > See the 'fill' argument in ?read.table > read.table("clipboard", header = FALSE, fill = TRUE) V1 V2V3 V4 V5 1 2.93290e-06 1.17772e-06 -0.645205 rs2282755 2 3.07521e-06 3.14000e-04 0.412997 rs1336838 3 4.84017e-06 2.18311e-01 0.188669 rs2660664 rs967785 4 9.77861e-06 7.04740e-02 0.294653 rs2660664 5 1.22767e-05 1.56325e-05 0.569826 rs6870519 6 2.27205e-05 1.89000e-04 -0.472862 rs10488345 HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data which has different size of elements in each row
Hi all, I have problem reading in a text file as follow. The following data has not column header. I tried the following command but failed, temp<-read.table("data.txt",header=F) An error message stated that line 3 did not have 4 elements. 0.293290E-05 0.117772E-05 -0.645205 rs2282755 0.307521E-05 0.000314 0.412997 rs1336838 0.484017E-05 0.218311 0.188669 rs2660664 rs967785 0.977861E-05 0.070474 0.294653 rs2660664 0.122767E-04 0.156325E-04 0.569826 rs6870519 0.227205E-04 0.000189 -0.472862 rs10488345 Does anyone know how to solve it? Thanks in advance for your help. Sit __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data type problem for vegan package
Hi Keun-Hyung, Can you send the data (off list) to me or at least show what str(gh1) produces, and show us the output from require(vegan) sessionInfo() Without that it is difficult to help. G On Tue, 2008-11-11 at 17:19 +0900, [EMAIL PROTECTED] wrote: > Dear all, > > I'm using R2.8 version, and am trying to do NMDS and calculate other > diversity indices in vegan package. > The problem is that it works with a small set of data (43 X 23; row by > column), but the following error message comes up with a larger data set (43 > X 104) (it seems not large to me at all). I made it sure that all data are > of numeric type as required. > > >gh1.H=diversity(gh1) > >FUN(newX[, i], ...) : invalid 'type' (character) of argument > > If someone has an idea, please... > > Keun-Hyung > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data type problem for vegan package
Dear all, I'm using R2.8 version, and am trying to do NMDS and calculate other diversity indices in vegan package. The problem is that it works with a small set of data (43 X 23; row by column), but the following error message comes up with a larger data set (43 X 104) (it seems not large to me at all). I made it sure that all data are of numeric type as required. >gh1.H=diversity(gh1) >FUN(newX[, i], ...) : invalid 'type' (character) of argument If someone has an idea, please... Keun-Hyung AmpharetearcticaAnaitideskoreanaArmandialanceolata Boccardiapolybranchia BoccardiaproboscideaBradavillosa Capitellacapitata Choneteres Cirratuluscirratus Cirriformiatentaculata Clymenellakoreana Clymenopsiscingulata Diopatrasugokai Eteonelonga Eumidasanguinea Glycerachirori GlyceraonomichiensisGlycerarouxii Glycerasubaenea Goniadajaponica HemipodusyenourensisHeteromastusfiliformis Lumbrinerisjaponica Lumbrinerislongifolia LumbrinerisnipponicaLycastopsisaugeneri MagelonajaponicaMicroclymenepropecaudataMicropodarkedubia Myriocheleoculata NeanthesjaponicaNeanthessuccinea NectoneanthesoxypodaNephtyspolybranchia Nepthyscaeca Nereisneoneanthes Nerinidesyamaguchii Notomastuslatericeus OphioglyceradistortaParalacydoniaparadoxa Pherusaplumosa Phylofelixasiaticus Phylofimbriatus Polydoraligni Prionospiojaponicus ScoloplosarmigerSigambratentaculata Spiofilicornis Sternaspisscutata TerebellidesstroemiiTravisiajaponica TravisiapupaHemigrapussinensis Ilyoplaxpingi Macrophthalmusjaponicus PhilyrapisumTritodynamiahorvathiEucopiaaustralis Hemisiriellaparva LophogasterpacificusSiriellaaequirems AlpheusdigitalisExpalaemoncarinicauda Leptochelagracilis LeptochelasydniensisPalaemongravieriAmpeliscabrevicornis Mandibulophoxusmai Melitarylovae Monoculodeskoreanus Adamnestiajaponica Epheriadecorata Glossaulaxdidymahayashii Leucorynchiacaledonica Leucotinadianae MitrellabicinctaRetusamatsusima Stenothyraedgawaensis Varicinassavaricifera Zeuxiscastus Zeuxissiquijorensis Coecellachinensis Dosinorbisjaponicus Gobraeuskazusensis Kelliaporculus Laternulamarilina Melliteryxpuncticulata Moerellairidescens Moerellarutila Musculussenhausia Nitidotellinanitidula Raetellopspulchella Ruditapesphilippinarum Ruditapesvariegatus Semelangulustokubei Solenstrictus Theorafragilis Amphipholissobrina Amphiuraaestuarii Phyllophorusordinatus Protankyrabidentata Temnopleurustoreumaticus Lingulaunguis 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 115 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 96 0 19 0 0 19 0 58 19 19 0 38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 38 0 0 0 0 0 77 0 192 962 0 0 0 0 0 0 0 0 0 0 0 192 0 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 19 0 0 0 0 0 154 0 0 19 0 38 0 0 19 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 115 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Re: [R] Data Manipulation, add frequency index
See ?ave and ?seq_along DF <- data.frame(Name = c("Mary", "Mary", "Mary", "Sam", "Sam", "John", "John", "John", "John"), stringsAsFactors = FALSE) DF$index <- ave(1:nrow(DF), DF$Name, FUN = seq_along) On Sat, Nov 8, 2008 at 5:43 AM, jie feng <[EMAIL PROTECTED]> wrote: > Hi, there, > > I have a simple data manipulation question for you. Thank you for your help! > > Suppose that I have this data about people appearing in a class > > > Mary > Mary > Mary > Sam > Sam > John > John > John > John > > Then I want to find out what exact time(s) the student appears at the > moment such as > > Mary 1 > Mary 2 > Mary 3 > Sam 1 > Sam 2 > John 1 > John 2 > John 3 > John 4 > > the fifth row shows tha Sam show the second times at the that moment. > > How can I manipulate the data in this way. Suppose that now I just have > "name" variable and want to add a colume of frequency? > > Best > > Jie > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Manipulation, add frequency index
one way is with ave(), e.g., dat <- data.frame(name = rep(c("Mary", "Sam", "John"), c(3,2,4))) dat$freq <- ave(seq_along(dat$name), dat$name, FUN = seq_along) dat I hope it helps. Best, Dimitris jie feng wrote: Hi, there, I have a simple data manipulation question for you. Thank you for your help! Suppose that I have this data about people appearing in a class Mary Mary Mary Sam Sam John John John John Then I want to find out what exact time(s) the student appears at the moment such as Mary 1 Mary 2 Mary 3 Sam 1 Sam 2 John 1 John 2 John 3 John 4 the fifth row shows tha Sam show the second times at the that moment. How can I manipulate the data in this way. Suppose that now I just have "name" variable and want to add a colume of frequency? Best Jie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data Manipulation, add frequency index
Hi, there, I have a simple data manipulation question for you. Thank you for your help! Suppose that I have this data about people appearing in a class Mary Mary Mary Sam Sam John John John John Then I want to find out what exact time(s) the student appears at the moment such as Mary 1 Mary 2 Mary 3 Sam 1 Sam 2 John 1 John 2 John 3 John 4 the fifth row shows tha Sam show the second times at the that moment. How can I manipulate the data in this way. Suppose that now I just have "name" variable and want to add a colume of frequency? Best Jie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation question
Thank you for your prompt assistance, cruz and Bart. Bart set me on the right track, and I modified his proposal to this: f <- function(data){ m <- match(data$stop,data$start) n <- min(length(m),which(is.na(m))) data$stop[n] } by(data,data$id,f) It also handles some special cases outside my small example dataset. Thank you again! Peter. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of bartjoosen Sent: 6. november 2008 11:31 To: r-help@r-project.org Subject: Re: [R] Data manipulation question How about: id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) data <- data.frame(id,start,stop) f <- function(data){ m <- match(data$start,data$stop) + 1 if (length(m)==1 && is.na(m)) m <- 1 if (length(m) > 1 && is.na(m[2])) m <- 1 data$stop[min(m,na.rm=T)] } by(data,data$id,f) The if statements in the function are for some special cases, in all the other cases the firs line will do the trick. I would like to add that using data is a somewhat bad behavior, as this overwrites the build in data function of R. And I changed the way you made up the data.frame, as your method would convert everything to factors. Good luck Bart Peter Jepsen wrote: > > Dear R-listers, > > I am a relatively inexperienced R-user currently migrating from Stata. I > am deeply frustrated by this data manipulation question: I know how I > could do it in Stata, but I cannot make it work in R. > > I have a data frame of hospitalization data where each row represents an > admission. I need to know when patients were first discharged, but the > problem is that patients were sometimes transferred between hospital > departments. In my data a transfer looks like a new admission, except > that it has a 'start' date equal to the previous admission's 'stop' > date. > > Here is an example: > > id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) > start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) > stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) > data <- as.data.frame(cbind(id,start,stop)) > data > #id start stop > # 1 a 06 > # 2 a 6 12 > # 3 a17 20 > # 4 a20 30 > # 5 b 01 > # 6 b 1 10 > # 7 c 03 > # 8 c 5 10 > # 9 c10 11 > # 10 c11 30 > # 11 c50 55 > # 12 d 06 > > So, what I want to end up with is this: > > id start stop > a 0 12 # This patient was transferred at time 6 and discharged at > time 12. The admission starting at time 17 is therefore irrelevant. > b 0 10 > c 0 3 > d 0 6 > > I have tried tons of variations over lapply, sapply, split, for etc., > all to no avail. > > Thank you in advance for any assistance. > > Best regards, > Peter Jepsen, MD. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Data-manipulation-question-tp20356835p20358624.htm l Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation question
How about: id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) data <- data.frame(id,start,stop) f <- function(data){ m <- match(data$start,data$stop) + 1 if (length(m)==1 && is.na(m)) m <- 1 if (length(m) > 1 && is.na(m[2])) m <- 1 data$stop[min(m,na.rm=T)] } by(data,data$id,f) The if statements in the function are for some special cases, in all the other cases the firs line will do the trick. I would like to add that using data is a somewhat bad behavior, as this overwrites the build in data function of R. And I changed the way you made up the data.frame, as your method would convert everything to factors. Good luck Bart Peter Jepsen wrote: > > Dear R-listers, > > I am a relatively inexperienced R-user currently migrating from Stata. I > am deeply frustrated by this data manipulation question: I know how I > could do it in Stata, but I cannot make it work in R. > > I have a data frame of hospitalization data where each row represents an > admission. I need to know when patients were first discharged, but the > problem is that patients were sometimes transferred between hospital > departments. In my data a transfer looks like a new admission, except > that it has a 'start' date equal to the previous admission's 'stop' > date. > > Here is an example: > > id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) > start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) > stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) > data <- as.data.frame(cbind(id,start,stop)) > data > #id start stop > # 1 a 06 > # 2 a 6 12 > # 3 a17 20 > # 4 a20 30 > # 5 b 01 > # 6 b 1 10 > # 7 c 03 > # 8 c 5 10 > # 9 c10 11 > # 10 c11 30 > # 11 c50 55 > # 12 d 06 > > So, what I want to end up with is this: > > id start stop > a 0 12 # This patient was transferred at time 6 and discharged at > time 12. The admission starting at time 17 is therefore irrelevant. > b 0 10 > c 0 3 > d 0 6 > > I have tried tons of variations over lapply, sapply, split, for etc., > all to no avail. > > Thank you in advance for any assistance. > > Best regards, > Peter Jepsen, MD. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Data-manipulation-question-tp20356835p20358624.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation question
On Thu, Nov 6, 2008 at 4:23 PM, Peter Jepsen <[EMAIL PROTECTED]> wrote: > > Here is an example: > > id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) > start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) > stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) > data <- as.data.frame(cbind(id,start,stop)) > data > #id start stop > # 1 a 06 > # 2 a 6 12 > # 3 a17 20 > # 4 a20 30 > # 5 b 01 > # 6 b 1 10 > # 7 c 03 > # 8 c 5 10 > # 9 c10 11 > # 10 c11 30 > # 11 c50 55 > # 12 d 06 > > So, what I want to end up with is this: > > id start stop > a 0 12 # This patient was transferred at time 6 and discharged at > time 12. The admission starting at time 17 is therefore irrelevant. > b 0 10 > c 0 3 > d 0 6 > Try this: result <- list() num <- length(levels(factor(data$id))) length(result) <- 3*num dim(result) <- c(3,num) result <- data[data$start == 0,] Y <- as.integer(row.names(result)) for (i in 1:num) { if (Y[i] == dim(data)[1]) (result[i,3] <- data[dim(data)[1],3]) else (result[i,3] <- data[Y[i]+1,3]) } result Sorry it is ugly cuz i am new too but hopefully it gives you some ideas. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data manipulation question
Dear R-listers, I am a relatively inexperienced R-user currently migrating from Stata. I am deeply frustrated by this data manipulation question: I know how I could do it in Stata, but I cannot make it work in R. I have a data frame of hospitalization data where each row represents an admission. I need to know when patients were first discharged, but the problem is that patients were sometimes transferred between hospital departments. In my data a transfer looks like a new admission, except that it has a 'start' date equal to the previous admission's 'stop' date. Here is an example: id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) data <- as.data.frame(cbind(id,start,stop)) data #id start stop # 1 a 06 # 2 a 6 12 # 3 a17 20 # 4 a20 30 # 5 b 01 # 6 b 1 10 # 7 c 03 # 8 c 5 10 # 9 c10 11 # 10 c11 30 # 11 c50 55 # 12 d 06 So, what I want to end up with is this: id start stop a 0 12 # This patient was transferred at time 6 and discharged at time 12. The admission starting at time 17 is therefore irrelevant. b 0 10 c 0 3 d 0 6 I have tried tons of variations over lapply, sapply, split, for etc., all to no avail. Thank you in advance for any assistance. Best regards, Peter Jepsen, MD. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to cite R data sets
The data sets should be documented... library(MASS) ?cats and then the source should tell you how to site it and specifically ?airquality On Tue, Nov 4, 2008 at 11:03 AM, dxc13 <[EMAIL PROTECTED]> wrote: > > Hi, > > As part of a project I am doing, I am using the "airquality" data set that > comes built into R as a means of showing an example of how my analysis will > be carried out. I know that citation("package") will produce a citation for > any R package, but is there a proper way to cite this data set? > > Thanks, > ~D > -- > View this message in context: > http://www.nabble.com/how-to-cite-R-data-sets-tp20325205p20325205.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to cite R data sets
Hi, As part of a project I am doing, I am using the "airquality" data set that comes built into R as a means of showing an example of how my analysis will be carried out. I know that citation("package") will produce a citation for any R package, but is there a proper way to cite this data set? Thanks, ~D -- View this message in context: http://www.nabble.com/how-to-cite-R-data-sets-tp20325205p20325205.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame column name as a function argument
First - you need to pass the data frame into the function. testing <- function (d, colname) { return (d[[colname]]) } d <- data.frame(cbind(x=1, y=1:10)) print (testing(d, 'x')) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of eric lee Sent: Friday, September 26, 2008 3:10 PM To: r-help@r-project.org Subject: [R] data frame column name as a function argument Hello, I'd like to pass a column name as the argument for a function, but I'm getting "NULL" as a return value. Any suggestions? Thanks. > d <- data.frame(cbind(x=1, y=1:10)) > d x y 1 1 1 2 1 2 3 1 3 4 1 4 5 1 5 6 1 6 7 1 7 8 1 8 9 1 9 10 1 10 > testing <- function(var) { + tst <- d$var[3] + tst + } > > dummy <- testing(y) > dummy NULL [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data frame column name as a function argument
Hello, I'd like to pass a column name as the argument for a function, but I'm getting "NULL" as a return value. Any suggestions? Thanks. > d <- data.frame(cbind(x=1, y=1:10)) > d x y 1 1 1 2 1 2 3 1 3 4 1 4 5 1 5 6 1 6 7 1 7 8 1 8 9 1 9 10 1 10 > testing <- function(var) { + tst <- d$var[3] + tst + } > > dummy <- testing(y) > dummy NULL [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data format for BiodiversityR
Hello, maybe it is better if you copy an extract of your dataset file in the message because the attached file did'nt seem to get through. Margherita 2008/9/14 Ndoh Innocent (Holy) <[EMAIL PROTECTED]> > Greetings dear friends. > Please, I really find problems having the program read my datasets (here > attached). > Have converted datasets to csv, imported but always not reaching the > target. > Would be very happy if some one out can help me on time. > Thanks > > Ndoh Mbue Innocent > International corporation office > China University of Geosciences > 388 Lumo road > 430074, Wuhan-China > Tel: 0086 27 67885947/0086 15927262962 > A gentlemen should be truly a moral person, a straightforward and reliable > personality,in solidarity with the community and rooted in self rescpect > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data format for BiodiversityR
Greetings dear friends. Please, I really find problems having the program read my datasets (here attached). Have converted datasets to csv, imported but always not reaching the target. Would be very happy if some one out can help me on time. Thanks Ndoh Mbue Innocent International corporation office China University of Geosciences 388 Lumo road 430074, Wuhan-China Tel: 0086 27 67885947/0086 15927262962 A gentlemen should be truly a moral person, a straightforward and reliable personality,in solidarity with the community and rooted in self rescpect __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data import
You can use a "connection" and read a portion of the data in at a time and process it. Do you need all the data at once? If so, I would agree that you either need more memory (and possibly a 64-bit version of the system), or you come up with a different approach to your processing. You have not indicated what problem you are trying to solve with the data. On Thu, Sep 11, 2008 at 3:32 AM, afshin fallah <[EMAIL PROTECTED]> wrote: > Dear All, > > I have a data set containing 2,122,164 records and 38198952 fields. > I can not import this data due to "momory problem". > Is there a way to solve this problem? > > Thanks > > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data import
> I have a data set containing 2,122,164 records and 38198952 fields. > I can not import this data due to "momory problem". > Is there a way to solve this problem? 1. Filter the data before you import it. Do you really need all 38 million fields? Can you ignore some of the 2 million records? Chapter 4 of the R Data Import/Export Manual may help. 2. Failing that, buy more memory for your PC. Regards, Richie. Mathematical Sciences Unit HSL ATTENTION: This message contains privileged and confidential inform...{{dropped:20}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data import
Dear All, I have a data set containing 2,122,164 records and 38198952 fields. I can not import this data due to "momory problem". Is there a way to solve this problem? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data frame question
I have a data frame containing sequences and I am interested in changing a few sequences in a window and the swapping the original sequence back after I have completed my analysis. My temporary data frame that I am creating seq.in.window does not like the way I am making me assignment. The variable seq.in.window gets numbers assigned to it, instead of the data contained in sequence.data. Can someone point out the mistake I am making? Thanks ../Murli x<-c("a","t","g","c") sequence.data<-structure(list(V1 = structure(c(4L, 1L, 3L, 2L, 2L, 3L, 3L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 4L, 1L, 4L, 3L, 1L), .Label = c("a", "c", "g", "t"), class = "factor"), V2 = structure(c(3L, 2L, 4L, 1L, 4L, 1L, 3L, 4L, 2L, 3L, 3L, 4L, 2L, 4L, 4L, 1L, 4L, 2L, 3L, 4L), .Label = c("a", "c", "g", "t"), class = "factor"), V3 = structure(c(3L, 3L, 3L, 3L, 3L, 1L, 4L, 2L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 4L, 1L, 4L, 1L, 4L), .Label = c("a", "c", "g", "t"), class = "factor"), V4 = structure(c(3L, 2L, 1L, 2L, 2L, 3L, 4L, 2L, 1L, 4L, 3L, 4L, 2L, 1L, 4L, 2L, 4L, 4L, 4L, 3L), .Label = c("a", "c", "g", "t"), class = "factor"), V5 = structure(c(3L, 3L, 3L, 2L, 1L, 1L, 2L, 4L, 2L, 3L, 4L, 2L, 1L, 1L, 4L, 4L, 4L, 2L, 1L, 4L), .Label = c("a", "c", "g", "t"), class = "factor"), V6 = structure(c(2L, 2L, 1L, 4L, 3L, 4L, 1L, 4L, 2L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 1L, 4L, 4L, 4L), .Label = c("a", "c", "g", "t"), class = "factor")), .Names = c("V1", "V2", "V3", "V4", "V5", "V6"), class = "data.frame", row.names = c("16", "4", "1", "9", "6", "2", "15", "19", "18", "12", "13", "91", "41", "21", "151", "14", "5", "8", "181", "121")) seq.in.window<-data.frame(matrix(0,nrow=20,ncol=5)) # Creating an empty data frame for(i in 1:20){ seq.in.window[i,1:5]<- sequence.data[i,1:5] #Can I do this assignment? print(seq.in.window[i,1:5]) rnd.seq =as.vector(sample(x,length(1:5), replace=TRUE)) sequence.data[i,5] =t(rnd.seq) print(sequence.data[i,1:5]) cat("\n") } for(i in 1:20){ sequence.data[i,1:5]=seq.in.window[i,1:5] } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data types in R
Amanda1988 <[EMAIL PROTECTED]> wrote: > > I was having a problem with a little simple function I wrote in R and I think > the problem was that R is representing fractional numbers in binary floating > point and not decimal notation, so sometimes I was having extra data points > counted. Is there a way to cast a number stored in a variable as an integer? To store a decimal fraction as the nearest integer, I would start with as.integer(round(x)) where x is the decimal fraction. Almost no computer language stores fraction numbers in decimal. If your code expects such storage, you might consider a different way of coding the problem. If you were to show your R code, someone might suggest a better way of doing thigs. In particular, for regular fractional sequences, it is better practice to generate them as integers, than divide by the common divisor, than to use seq() with fractions. In other cases, it can be helpful to use the length.out= argument to seq(), rather than the to= argument. HTH -- Mike Prager, NOAA, Beaufort, NC * Opinions expressed are personal and not represented otherwise. * Any use of tradenames does not constitute a NOAA endorsement. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data types in R
faq 7.31 On Fri, Aug 15, 2008 at 9:16 AM, Amanda1988 <[EMAIL PROTECTED]> wrote: > > I was having a problem with a little simple function I wrote in R and I think > the problem was that R is representing fractional numbers in binary floating > point and not decimal notation, so sometimes I was having extra data points > counted. Is there a way to cast a number stored in a variable as an integer? > -- > View this message in context: > http://www.nabble.com/data-types-in-R-tp19001443p19001443.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data types in R
I was having a problem with a little simple function I wrote in R and I think the problem was that R is representing fractional numbers in binary floating point and not decimal notation, so sometimes I was having extra data points counted. Is there a way to cast a number stored in a variable as an integer? -- View this message in context: http://www.nabble.com/data-types-in-R-tp19001443p19001443.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] viewing data in something similar to 'R Data Editor'
With the R prompt still active, data may be cleanly viewed (not edited) in a separate window by a simplification of function xless in Hmisc by Frank Harrell: look. <- function (x, ..., title = substring(deparse(substitute(x)), 1, 20)) { page(x, method = "print", title = title,options=(width=150)) invisible() } Names alone can be viewed separately as well: names. function (x, ..., title = c(substring(deparse(substitute(x)), 1, 20))) { title=paste(title," ...NAMES") page(names(x), method = "print", title = title,options=(width=150)) invisible() } Both functions work in Windows XP and Mac OS X Leopard under R 2.7.1 and earlier. -- D L McArthur, UCLA Department of Neurosurgery > > On Aug 1, 2008, at 7:29 PM, Rachel Schwartz wrote: > >> Hi, >> >> I would like to view matrices I am working with in a clean, easy to read, >> separate window. >> >> A friend showed me how to do something like I want with edit(). I can >> view >> the matrix in the 'R Data Editor': >> >> For a sample matrix: >> >>> mat=matrix(1:15,ncol=3) >>> mat >> [,1] [,2] [,3] >> [1,]16 11 >> [2,]27 12 >> [3,]38 13 >> [4,]49 14 >> [5,]5 10 15 >> >> >>> look=function(x) invisible(edit(x)) >>> look(mat) >> >> That opens the 'R Data Editor' with mat loaded. >> >> >> But I am not able to do any other actions in R while this 'R Data >> Editor' is >> open. I want to keep this open while >> I do other work. >> >> Is there a way to view my data in something like the 'R Data Editor' that >> still allows me to do work at the same time? >> I am looking for something other than str(), head(), and tail() which >> just >> allow me a quick peak at the object. I do not >> want to edit the object in the table, but be able to watch the object >> change >> while I run anything that would manipulate it. >> >> Thank you for your help. >> >> Best, >> Rachel Schwartz >> Graduate Student Researcher >> UCSD; Scripps Institution of Oceanography __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] viewing data in something similar to 'R Data Editor'
Erich Neuwirth wrote: On Windows, this could be done with rcom and Excel. rcom can use Excel as a server and put matrices into Excel ranges. The code you run in R could have a statement resending matrices so the current version is displayed. I could set up more conveniently what yo want if RGui had a way of automatically running a function each time code is run from the command line. So here is a question to the masters: Can I have code automatically run each time some code is run from the command line? This is something I have already asked. It seems this is not possible currently. Something like ?addTaskCallback, but registering an R function that is to be called *before* a top-level task is run would be wonderful (???addTaskStart). Best, Philippe Grosjean P.S.: this would solve also another problem we have for some GUIs: to know if R is busy processing commands issued at the command line or not (just set a busy flag to TRUE with a function registered with "addTaskStart", and set it to FALSE with a function registered with addTaskCallback). Of course, a R function that provides this information more directly would be much, much better. On Aug 1, 2008, at 7:29 PM, Rachel Schwartz wrote: Hi, I would like to view matrices I am working with in a clean, easy to read, separate window. A friend showed me how to do something like I want with edit(). I can view the matrix in the 'R Data Editor': For a sample matrix: mat=matrix(1:15,ncol=3) mat [,1] [,2] [,3] [1,]16 11 [2,]27 12 [3,]38 13 [4,]49 14 [5,]5 10 15 look=function(x) invisible(edit(x)) look(mat) That opens the 'R Data Editor' with mat loaded. But I am not able to do any other actions in R while this 'R Data Editor' is open. I want to keep this open while I do other work. Is there a way to view my data in something like the 'R Data Editor' that still allows me to do work at the same time? I am looking for something other than str(), head(), and tail() which just allow me a quick peak at the object. I do not want to edit the object in the table, but be able to watch the object change while I run anything that would manipulate it. Thank you for your help. Best, Rachel Schwartz Graduate Student Researcher UCSD; Scripps Institution of Oceanography [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] viewing data in something similar to 'R Data Editor'
On Windows, this could be done with rcom and Excel. rcom can use Excel as a server and put matrices into Excel ranges. The code you run in R could have a statement resending matrices so the current version is displayed. I could set up more conveniently what yo want if RGui had a way of automatically running a function each time code is run from the command line. So here is a question to the masters: Can I have code automatically run each time some code is run from the command line? On Aug 1, 2008, at 7:29 PM, Rachel Schwartz wrote: Hi, I would like to view matrices I am working with in a clean, easy to read, separate window. A friend showed me how to do something like I want with edit(). I can view the matrix in the 'R Data Editor': For a sample matrix: mat=matrix(1:15,ncol=3) mat [,1] [,2] [,3] [1,]16 11 [2,]27 12 [3,]38 13 [4,]49 14 [5,]5 10 15 look=function(x) invisible(edit(x)) look(mat) That opens the 'R Data Editor' with mat loaded. But I am not able to do any other actions in R while this 'R Data Editor' is open. I want to keep this open while I do other work. Is there a way to view my data in something like the 'R Data Editor' that still allows me to do work at the same time? I am looking for something other than str(), head(), and tail() which just allow me a quick peak at the object. I do not want to edit the object in the table, but be able to watch the object change while I run anything that would manipulate it. Thank you for your help. Best, Rachel Schwartz Graduate Student Researcher UCSD; Scripps Institution of Oceanography [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] viewing data in something similar to 'R Data Editor'
Try view in the svViews package. On Fri, Aug 1, 2008 at 1:29 PM, Rachel Schwartz <[EMAIL PROTECTED]> wrote: > Hi, > > I would like to view matrices I am working with in a clean, easy to read, > separate window. > > A friend showed me how to do something like I want with edit(). I can view > the matrix in the 'R Data Editor': > > For a sample matrix: > >> mat=matrix(1:15,ncol=3) >> mat > [,1] [,2] [,3] > [1,]16 11 > [2,]27 12 > [3,]38 13 > [4,]49 14 > [5,]5 10 15 > > >> look=function(x) invisible(edit(x)) >> look(mat) > > That opens the 'R Data Editor' with mat loaded. > > > But I am not able to do any other actions in R while this 'R Data Editor' is > open. I want to keep this open while > I do other work. > > Is there a way to view my data in something like the 'R Data Editor' that > still allows me to do work at the same time? > I am looking for something other than str(), head(), and tail() which just > allow me a quick peak at the object. I do not > want to edit the object in the table, but be able to watch the object change > while I run anything that would manipulate it. > > Thank you for your help. > > Best, > Rachel Schwartz > Graduate Student Researcher > UCSD; Scripps Institution of Oceanography > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] viewing data in something similar to 'R Data Editor'
Rachel, You may want to try JGR, http://jgr.markushelbig.org/JGR.html which has, among many nice IDE features, an object browser that will do what you want. HTH, Jim Porzak Responsys, Inc. San Francisco, CA http://www.linkedin.com/in/jimporzak useR Group SF: http://ia.meetup.com/67/ On Fri, Aug 1, 2008 at 10:29 AM, Rachel Schwartz <[EMAIL PROTECTED]> wrote: > Hi, > > I would like to view matrices I am working with in a clean, easy to read, > separate window. > > A friend showed me how to do something like I want with edit(). I can view > the matrix in the 'R Data Editor': > > For a sample matrix: > >> mat=matrix(1:15,ncol=3) >> mat > [,1] [,2] [,3] > [1,]16 11 > [2,]27 12 > [3,]38 13 > [4,]49 14 > [5,]5 10 15 > > >> look=function(x) invisible(edit(x)) >> look(mat) > > That opens the 'R Data Editor' with mat loaded. > > > But I am not able to do any other actions in R while this 'R Data Editor' is > open. I want to keep this open while > I do other work. > > Is there a way to view my data in something like the 'R Data Editor' that > still allows me to do work at the same time? > I am looking for something other than str(), head(), and tail() which just > allow me a quick peak at the object. I do not > want to edit the object in the table, but be able to watch the object change > while I run anything that would manipulate it. > > Thank you for your help. > > Best, > Rachel Schwartz > Graduate Student Researcher > UCSD; Scripps Institution of Oceanography > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] viewing data in something similar to 'R Data Editor'
Rachel Schwartz wrote: Thanks Erik, almost worked! I am a mac user and for some reason View worked perfectly for my PC using friend, but doesn't for me. When I tried: > mat=matrix(1:10,ncol=2) > mat [,1] [,2] [1,]16 [2,]27 [3,]38 [4,]49 [5,]5 10 > View(mat) I get no error message, but nothing happens (besides spinning ball of death) and I have to force quit R. I tried a couple different variations but still no success with using View. Suggestions? Not from me, no Mac here. Maybe someone else? Or else there is a Mac specific list, R-SIG-Mac, google for it. On Fri, Aug 1, 2008 at 10:52 AM, Erik Iverson <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: See ?View but I don't think it 'auto updates' per your last sentence. Maybe there's a better option? Rachel Schwartz wrote: Hi, I would like to view matrices I am working with in a clean, easy to read, separate window. A friend showed me how to do something like I want with edit(). I can view the matrix in the 'R Data Editor': For a sample matrix: mat=matrix(1:15,ncol=3) mat [,1] [,2] [,3] [1,]16 11 [2,]27 12 [3,]38 13 [4,]49 14 [5,]5 10 15 look=function(x) invisible(edit(x)) look(mat) That opens the 'R Data Editor' with mat loaded. But I am not able to do any other actions in R while this 'R Data Editor' is open. I want to keep this open while I do other work. Is there a way to view my data in something like the 'R Data Editor' that still allows me to do work at the same time? I am looking for something other than str(), head(), and tail() which just allow me a quick peak at the object. I do not want to edit the object in the table, but be able to watch the object change while I run anything that would manipulate it. Thank you for your help. Best, Rachel Schwartz Graduate Student Researcher UCSD; Scripps Institution of Oceanography [[alternative HTML version deleted]] __ R-help@r-project.org <mailto:R-help@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] viewing data in something similar to 'R Data Editor'
Thanks Erik, almost worked! I am a mac user and for some reason View worked perfectly for my PC using friend, but doesn't for me. When I tried: > mat=matrix(1:10,ncol=2) > mat [,1] [,2] [1,]16 [2,]27 [3,]38 [4,]49 [5,]5 10 > View(mat) I get no error message, but nothing happens (besides spinning ball of death) and I have to force quit R. I tried a couple different variations but still no success with using View. Suggestions? On Fri, Aug 1, 2008 at 10:52 AM, Erik Iverson <[EMAIL PROTECTED]>wrote: > See ?View but I don't think it 'auto updates' per your last sentence. Maybe > there's a better option? > > Rachel Schwartz wrote: > >> Hi, >> >> I would like to view matrices I am working with in a clean, easy to read, >> separate window. >> >> A friend showed me how to do something like I want with edit(). I can >> view >> the matrix in the 'R Data Editor': >> >> For a sample matrix: >> >> mat=matrix(1:15,ncol=3) >>> mat >>> >> [,1] [,2] [,3] >> [1,]16 11 >> [2,]27 12 >> [3,]38 13 >> [4,]49 14 >> [5,]5 10 15 >> >> >> look=function(x) invisible(edit(x)) >>> look(mat) >>> >> >> That opens the 'R Data Editor' with mat loaded. >> >> >> But I am not able to do any other actions in R while this 'R Data Editor' >> is >> open. I want to keep this open while >> I do other work. >> >> Is there a way to view my data in something like the 'R Data Editor' that >> still allows me to do work at the same time? >> I am looking for something other than str(), head(), and tail() which >> just >> allow me a quick peak at the object. I do not >> want to edit the object in the table, but be able to watch the object >> change >> while I run anything that would manipulate it. >> >> Thank you for your help. >> >> Best, >> Rachel Schwartz >> Graduate Student Researcher >> UCSD; Scripps Institution of Oceanography >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] viewing data in something similar to 'R Data Editor'
See ?View but I don't think it 'auto updates' per your last sentence. Maybe there's a better option? Rachel Schwartz wrote: Hi, I would like to view matrices I am working with in a clean, easy to read, separate window. A friend showed me how to do something like I want with edit(). I can view the matrix in the 'R Data Editor': For a sample matrix: mat=matrix(1:15,ncol=3) mat [,1] [,2] [,3] [1,]16 11 [2,]27 12 [3,]38 13 [4,]49 14 [5,]5 10 15 look=function(x) invisible(edit(x)) look(mat) That opens the 'R Data Editor' with mat loaded. But I am not able to do any other actions in R while this 'R Data Editor' is open. I want to keep this open while I do other work. Is there a way to view my data in something like the 'R Data Editor' that still allows me to do work at the same time? I am looking for something other than str(), head(), and tail() which just allow me a quick peak at the object. I do not want to edit the object in the table, but be able to watch the object change while I run anything that would manipulate it. Thank you for your help. Best, Rachel Schwartz Graduate Student Researcher UCSD; Scripps Institution of Oceanography [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] viewing data in something similar to 'R Data Editor'
Hi, I would like to view matrices I am working with in a clean, easy to read, separate window. A friend showed me how to do something like I want with edit(). I can view the matrix in the 'R Data Editor': For a sample matrix: > mat=matrix(1:15,ncol=3) > mat [,1] [,2] [,3] [1,]16 11 [2,]27 12 [3,]38 13 [4,]49 14 [5,]5 10 15 > look=function(x) invisible(edit(x)) > look(mat) That opens the 'R Data Editor' with mat loaded. But I am not able to do any other actions in R while this 'R Data Editor' is open. I want to keep this open while I do other work. Is there a way to view my data in something like the 'R Data Editor' that still allows me to do work at the same time? I am looking for something other than str(), head(), and tail() which just allow me a quick peak at the object. I do not want to edit the object in the table, but be able to watch the object change while I run anything that would manipulate it. Thank you for your help. Best, Rachel Schwartz Graduate Student Researcher UCSD; Scripps Institution of Oceanography [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data length mismatch.
I think the merge() function should be adequate for this task. Here is an example. A <- data.frame(day=1:5, x=runif(5)) B <- data.frame(day=3:7, x=runif(5)) A day x 1 1 0.9764534 2 2 0.9693998 3 3 0.1324933 4 4 0.8311153 5 5 0.3264465 B <- data.frame(day=3:8, x=runif(6)) B day x 1 3 0.5096328 2 4 0.6043132 3 5 0.7947639 4 6 0.7619096 5 7 0.1571041 6 8 0.3473159 D <- merge(A,B, by='day',all=TRUE) D$diff <- D$x.x - D$x.y D day x.x x.y diff 1 1 0.9764534NA NA 2 2 0.9693998NA NA 3 3 0.1324933 0.5096328 -0.3771395 4 4 0.8311153 0.6043132 0.2268022 5 5 0.3264465 0.7947639 -0.4683174 6 6NA 0.7619096 NA 7 7NA 0.1571041 NA 8 8NA 0.3473159 NA -Don At 2:26 PM -0700 7/26/08, <[EMAIL PROTECTED]> wrote: I have two vectos (list) that represent a years of data. Each "row" is represented by the day of year and the quantity that was sold for that day. I would like to form a new vector that is the difference between the two years of data. A sample of A (and similarly B) looks like: A[1:5,] DayOfYearx 1 1 1429 2 2 3952 3 3 3049 4 4 2844 5 5 2219 D <- A - B This works just fine if A and B are both the same length. How is the best way to handle the situation where A and B are of different lengths? If the day of year exists in both vectors (lists) then I just want the coorespondng "row" in D to be the difference btween A and B values. If the "row" doesn't exist in either A or B then the difference should be treated as if the missing "row" was zero. Is this feasible? Thank you. Kevin __ R-help@r-project.org mailing list https:// stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- - Don MacQueen Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data length mismatch.
Sorry, the last one should be: ix <- match(B$DayOfYear, A$DayOfYear) A[ix, "x"] <- A[ix, "x"] - B$x Again we are assuming B's days are a subset of A's. On Sat, Jul 26, 2008 at 6:08 PM, Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > Here is a third solution. > > A[B$DayOfYear, "x"] <- A[B$DayOfYear, "x"] - B$x > > Its assumes B's days are a subset of A's but if that's not the case then > you would need to intersect them first: ?intersect > > On Sat, Jul 26, 2008 at 5:26 PM, <[EMAIL PROTECTED]> wrote: >> I have two vectos (list) that represent a years of data. Each "row" is >> represented by the day of year and the quantity that was sold for that day. >> I would like to form a new vector that is the difference between the two >> years of data. A sample of A (and similarly B) looks like: >> >>> A[1:5,] >> DayOfYearx >> 1 1 1429 >> 2 2 3952 >> 3 3 3049 >> 4 4 2844 >> 5 5 2219 >>> >> >> D <- A - B >> >> This works just fine if A and B are both the same length. How is the best >> way to handle the situation where A and B are of different lengths? If the >> day of year exists in both vectors (lists) then I just want the coorespondng >> "row" in D to be the difference btween A and B values. If the "row" doesn't >> exist in either A or B then the difference should be treated as if the >> missing "row" was zero. Is this feasible? >> >> Thank you. >> >> Kevin >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data length mismatch.
Here is a third solution. A[B$DayOfYear, "x"] <- A[B$DayOfYear, "x"] - B$x Its assumes B's days are a subset of A's but if that's not the case then you would need to intersect them first: ?intersect On Sat, Jul 26, 2008 at 5:26 PM, <[EMAIL PROTECTED]> wrote: > I have two vectos (list) that represent a years of data. Each "row" is > represented by the day of year and the quantity that was sold for that day. I > would like to form a new vector that is the difference between the two years > of data. A sample of A (and similarly B) looks like: > >> A[1:5,] > DayOfYearx > 1 1 1429 > 2 2 3952 > 3 3 3049 > 4 4 2844 > 5 5 2219 >> > > D <- A - B > > This works just fine if A and B are both the same length. How is the best way > to handle the situation where A and B are of different lengths? If the day of > year exists in both vectors (lists) then I just want the coorespondng "row" > in D to be the difference btween A and B values. If the "row" doesn't exist > in either A or B then the difference should be treated as if the missing > "row" was zero. Is this feasible? > > Thank you. > > Kevin > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data length mismatch.
Here is a second solution. This one uses sqldf instead of zoo: library(zoo) sqldf("select A.x - ifnull(B.x, 0) from A left join B using(DayOfYear)") See http://sqldf.googlecode.com On Sat, Jul 26, 2008 at 5:26 PM, <[EMAIL PROTECTED]> wrote: > I have two vectos (list) that represent a years of data. Each "row" is > represented by the day of year and the quantity that was sold for that day. I > would like to form a new vector that is the difference between the two years > of data. A sample of A (and similarly B) looks like: > >> A[1:5,] > DayOfYearx > 1 1 1429 > 2 2 3952 > 3 3 3049 > 4 4 2844 > 5 5 2219 >> > > D <- A - B > > This works just fine if A and B are both the same length. How is the best way > to handle the situation where A and B are of different lengths? If the day of > year exists in both vectors (lists) then I just want the coorespondng "row" > in D to be the difference btween A and B values. If the "row" doesn't exist > in either A or B then the difference should be treated as if the missing > "row" was zero. Is this feasible? > > Thank you. > > Kevin > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data length mismatch.
For the last statement we may prefer this so it stays a zoo object: > m <- merge(Az, Bz, fill = 0) > m[,1] - m[,2] 12345 14290 30490 2219 On Sat, Jul 26, 2008 at 5:53 PM, Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > Look at merge.zoo > >> library(zoo) >> dput(A) > structure(list(DayOfYear = 1:5, x = c(1429L, 3952L, 3049L, 2844L, > 2219L)), .Names = c("DayOfYear", "x"), class = "data.frame", row.names = > c("1", > "2", "3", "4", "5")) >> B <- A[c(2,4),] >> Az <- zoo(A$x, A$DayOfYear) >> Bz <- zoo(B$x, B$DayOfYear) >> merge(Az, Bz, fill = 0) >Az Bz > 1 14290 > 2 3952 3952 > 3 30490 > 4 2844 2844 > 5 22190 >> merge(Az, Bz, fill = 0) %*% c(1, -1) > [,1] > [1,] 1429 > [2,]0 > [3,] 3049 > [4,]0 > [5,] 2219 > > > On Sat, Jul 26, 2008 at 5:26 PM, <[EMAIL PROTECTED]> wrote: >> I have two vectos (list) that represent a years of data. Each "row" is >> represented by the day of year and the quantity that was sold for that day. >> I would like to form a new vector that is the difference between the two >> years of data. A sample of A (and similarly B) looks like: >> >>> A[1:5,] >> DayOfYearx >> 1 1 1429 >> 2 2 3952 >> 3 3 3049 >> 4 4 2844 >> 5 5 2219 >>> >> >> D <- A - B >> >> This works just fine if A and B are both the same length. How is the best >> way to handle the situation where A and B are of different lengths? If the >> day of year exists in both vectors (lists) then I just want the coorespondng >> "row" in D to be the difference btween A and B values. If the "row" doesn't >> exist in either A or B then the difference should be treated as if the >> missing "row" was zero. Is this feasible? >> >> Thank you. >> >> Kevin >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data length mismatch.
Look at merge.zoo > library(zoo) > dput(A) structure(list(DayOfYear = 1:5, x = c(1429L, 3952L, 3049L, 2844L, 2219L)), .Names = c("DayOfYear", "x"), class = "data.frame", row.names = c("1", "2", "3", "4", "5")) > B <- A[c(2,4),] > Az <- zoo(A$x, A$DayOfYear) > Bz <- zoo(B$x, B$DayOfYear) > merge(Az, Bz, fill = 0) Az Bz 1 14290 2 3952 3952 3 30490 4 2844 2844 5 22190 > merge(Az, Bz, fill = 0) %*% c(1, -1) [,1] [1,] 1429 [2,]0 [3,] 3049 [4,]0 [5,] 2219 On Sat, Jul 26, 2008 at 5:26 PM, <[EMAIL PROTECTED]> wrote: > I have two vectos (list) that represent a years of data. Each "row" is > represented by the day of year and the quantity that was sold for that day. I > would like to form a new vector that is the difference between the two years > of data. A sample of A (and similarly B) looks like: > >> A[1:5,] > DayOfYearx > 1 1 1429 > 2 2 3952 > 3 3 3049 > 4 4 2844 > 5 5 2219 >> > > D <- A - B > > This works just fine if A and B are both the same length. How is the best way > to handle the situation where A and B are of different lengths? If the day of > year exists in both vectors (lists) then I just want the coorespondng "row" > in D to be the difference btween A and B values. If the "row" doesn't exist > in either A or B then the difference should be treated as if the missing > "row" was zero. Is this feasible? > > Thank you. > > Kevin > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data length mismatch.
I have two vectos (list) that represent a years of data. Each "row" is represented by the day of year and the quantity that was sold for that day. I would like to form a new vector that is the difference between the two years of data. A sample of A (and similarly B) looks like: > A[1:5,] DayOfYearx 1 1 1429 2 2 3952 3 3 3049 4 4 2844 5 5 2219 > D <- A - B This works just fine if A and B are both the same length. How is the best way to handle the situation where A and B are of different lengths? If the day of year exists in both vectors (lists) then I just want the coorespondng "row" in D to be the difference btween A and B values. If the "row" doesn't exist in either A or B then the difference should be treated as if the missing "row" was zero. Is this feasible? Thank you. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data transformation
on 07/22/2008 11:24 AM Christian Hof wrote: Dear all, how can I, with R, transform a presence-only table (with the names of the species (1st column), the lat information of the sites (2nd column) and the lon information of the sites (3rd column)) into a presence-absence (0/1) matrix of species occurrences across sites, as given in the below example? Thanks a lot for your help! Christian My initial table: specieslatlon sp11010 sp11030 sp12010 sp12020 sp12030 sp21030 sp22030 sp23030 My desired matrix: latlonsp1sp2 101010 102000 103011 201010 202010 203011 301000 302000 303001 One approach would be to use ftable(). Presuming that your source data is in a data frame called 'DF': > ftable(species ~ lat + lon, data = DF) species sp1 sp2 lat lon 10 101 0 200 0 301 1 20 101 0 201 0 301 1 30 100 0 200 0 300 1 See ?ftable and/or ?ftable.formula HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data transformation
Dear all, how can I, with R, transform a presence-only table (with the names of the species (1st column), the lat information of the sites (2nd column) and the lon information of the sites (3rd column)) into a presence-absence (0/1) matrix of species occurrences across sites, as given in the below example? Thanks a lot for your help! Christian My initial table: species lat lon sp1 10 10 sp1 10 30 sp1 20 10 sp1 20 20 sp1 20 30 sp2 10 30 sp2 20 30 sp2 30 30 My desired matrix: lat lon sp1 sp2 10 10 1 0 10 20 0 0 10 30 1 1 20 10 1 0 20 20 1 0 20 30 1 1 30 10 0 0 30 20 0 0 30 30 0 1 -- Christian Hof, PhD student Center for Macroecology & Evolution University of Copenhagen www.macroecology.ku.dk & Biodiversity & Global Change Lab Museo Nacional de Ciencias Naturales, Madrid www.biochange-lab.eu mobile ES .. +34 697 508 519 mobile DE .. +49 176 205 189 27 mail .. [EMAIL PROTECTED] mail2 .. [EMAIL PROTECTED] blog .. www.vogelwart.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Manipulations and SQL
The sqldf R package can do that using SQL syntax. See: http://sqldf.googlecode.com The R merge command can do a join but not using SQL syntax. See ?merge On Mon, Jul 14, 2008 at 5:29 PM, Willa Wei <[EMAIL PROTECTED]> wrote: > > Greetings, > > I am new to R and have some background knowledge about SQL. I'd like to > know whether there is a way to manipulate the R datasets (or data > frames) using SQL statements. For example, I have two data frames and > both of them have a column called "id", then I want to join this two > data frames into one. In SQL, we can just simply use the join comment. > What should we do in R? Is there any package that allows us to run SQL > statements with R data? > > Thank you in advance for your help, > Willa > > This message contains confidential information and is in...{{dropped:8}} > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data Manipulations and SQL
Greetings, I am new to R and have some background knowledge about SQL. I'd like to know whether there is a way to manipulate the R datasets (or data frames) using SQL statements. For example, I have two data frames and both of them have a column called "id", then I want to join this two data frames into one. In SQL, we can just simply use the join comment. What should we do in R? Is there any package that allows us to run SQL statements with R data? Thank you in advance for your help, Willa This message contains confidential information and is in...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data summerization etc...
See sqldf home page: http://sqldf.googlecode.com e.g. library(sqldf) set.seed(1) pti <-rnorm(7,10) fid <- rnorm(7,100) finc <- rnorm(7,1000) # set is a reserved word in SQL so use sset sset <- data.frame(fid,pti,finc) system.time(out <- sqldf("select fid, sum(pti) from sset group by fid")) # 3 seconds On Fri, Jul 11, 2008 at 6:44 PM, sj <[EMAIL PROTECTED]> wrote: > Hello, > > I am trying to do some fairly straightforward data summarization, i.e., the > kind you would do with a pivot table in excel or by using SQL queires. I > have a moderately sized data set of ~70,000 records and I am trying to > compute some group averages and sum values within groups. the code example > below shows how I am trying to go about doing this > > pti <-rnorm(7,10) > fid <- rnorm(7,100) > finc <- rnorm(7,1000) > > > ### compute the sums of pti within fid groups > sum_pinc <-aggregate(cbind(fid,pti),list(fid),FUN=sum) > > compute mean finc within fid groups > tot_finc <- aggregate(cbind(fid,finc),list(fid),FUN=mean) > > when I try to do it this way I get an error message telling me that enough > memory cannot be allocated ( I am using R 2.7.1 on Windows XP with 2 GB of > Memory). I figure that there must be a more efficent way to go about doing > this. Please suggest. > > I would typically do this kind of task in a database and use SQL to push the > data around. I know RODBC allows you to write SQL to query external DBs. Is > there any mechanisim that allows you to write SQL queies against datasets > internal to R e.g. in the case above > > > I could do something like > > set <- cbind(fid,pti,finc) > > select fid, sum(pti) > from set > group by fid > > that would be handy! > > Thanks, > > Spencer > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data summerization etc...
On Sat, Jul 12, 2008 at 7:44 AM, sj <[EMAIL PROTECTED]> wrote: > Hello, > > I am trying to do some fairly straightforward data summarization, i.e., the > kind you would do with a pivot table in excel or by using SQL queires. I > have a moderately sized data set of ~70,000 records and I am trying to > compute some group averages and sum values within groups. the code example > below shows how I am trying to go about doing this > You might want to have a look at the reshape package - http://had.co.nz/reshape - it's design was much inspired by pivot tables and sql crosstab queries. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data summarization etc...
I am sorry. Upon inspection, you only tried to create 70,000 categories. However, the calculations for creating the 140,000 subsetted values pti and finc exhausted your memory or the memory allocated to/in R. Best, Daniel - cuncta stricte discussurus - -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Daniel Malter Gesendet: Friday, July 11, 2008 7:53 PM An: 'sj'; 'r-help' Betreff: Re: [R] data summarization etc... The problem is that you do not really have categories. You draw 3 times 7 random normal variables and then try to subset one by the other. Since, no of the values will perfectly coincide with another, your code would create something like 7^3 categories. No wonder that you are running out of memory. So what you are doing is nonsensical unless you really have some groups/categories that cluster your data and which are filled with a substantial number of observations (see example below). x1=rnorm(3,0,1) x2=rnorm(3,10,5) group1=rep(c(1:3),each=1) group2=rep(c(1:3),1) aggregate(cbind(x1,x2),list(group1,group2),FUN=mean) Best, Daniel - cuncta stricte discussurus - -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von sj Gesendet: Friday, July 11, 2008 6:47 PM An: r-help Betreff: [R] data summarization etc... Hello, I am trying to do some fairly straightforward data summarization, i.e., the kind you would do with a pivot table in excel or by using SQL queires. I have a moderately sized data set of ~70,000 records and I am trying to compute some group averages and sum values within groups. the code example below shows how I am trying to go about doing this pti <-rnorm(7,10) fid <- rnorm(7,100) finc <- rnorm(7,1000) ### compute the sums of pti within fid groups sum_pinc <-aggregate(cbind(fid,pti),list(fid),FUN=sum) compute mean finc within fid groups tot_finc <- aggregate(cbind(fid,finc),list(fid),FUN=mean) when I try to do it this way I get an error message telling me that enough memory cannot be allocated ( I am using R 2.7.1 on Windows XP with 2 GB of Memory). I figure that there must be a more efficent way to go about doing this. Please suggest. I would typically do this kind of task in a database and use SQL to push the data around. I know RODBC allows you to write SQL to query external DBs. Is there any mechanisim that allows you to write SQL queies against datasets internal to R e.g. in the case above I could do something like set <- cbind(fid,pti,finc) select fid, sum(pti) from set group by fid that would be handy! Thanks, Spencer [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data summarization etc...
The problem is that you do not really have categories. You draw 3 times 7 random normal variables and then try to subset one by the other. Since, no of the values will perfectly coincide with another, your code would create something like 7^3 categories. No wonder that you are running out of memory. So what you are doing is nonsensical unless you really have some groups/categories that cluster your data and which are filled with a substantial number of observations (see example below). x1=rnorm(3,0,1) x2=rnorm(3,10,5) group1=rep(c(1:3),each=1) group2=rep(c(1:3),1) aggregate(cbind(x1,x2),list(group1,group2),FUN=mean) Best, Daniel - cuncta stricte discussurus - -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von sj Gesendet: Friday, July 11, 2008 6:47 PM An: r-help Betreff: [R] data summarization etc... Hello, I am trying to do some fairly straightforward data summarization, i.e., the kind you would do with a pivot table in excel or by using SQL queires. I have a moderately sized data set of ~70,000 records and I am trying to compute some group averages and sum values within groups. the code example below shows how I am trying to go about doing this pti <-rnorm(7,10) fid <- rnorm(7,100) finc <- rnorm(7,1000) ### compute the sums of pti within fid groups sum_pinc <-aggregate(cbind(fid,pti),list(fid),FUN=sum) compute mean finc within fid groups tot_finc <- aggregate(cbind(fid,finc),list(fid),FUN=mean) when I try to do it this way I get an error message telling me that enough memory cannot be allocated ( I am using R 2.7.1 on Windows XP with 2 GB of Memory). I figure that there must be a more efficent way to go about doing this. Please suggest. I would typically do this kind of task in a database and use SQL to push the data around. I know RODBC allows you to write SQL to query external DBs. Is there any mechanisim that allows you to write SQL queies against datasets internal to R e.g. in the case above I could do something like set <- cbind(fid,pti,finc) select fid, sum(pti) from set group by fid that would be handy! Thanks, Spencer [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data summerization etc...
Hello, Have you tried using the GUI Rattle from www.rattle.togaware.com . It works pretty well for summarization. Regards, Ajay www.decisionstats.com On Sat, Jul 12, 2008 at 4:14 AM, sj <[EMAIL PROTECTED]> wrote: > > Hello, > > I am trying to do some fairly straightforward data summarization, i.e., the > kind you would do with a pivot table in excel or by using SQL queires. I > have a moderately sized data set of ~70,000 records and I am trying to > compute some group averages and sum values within groups. the code example > below shows how I am trying to go about doing this > > pti <-rnorm(7,10) > fid <- rnorm(7,100) > finc <- rnorm(7,1000) > > > ### compute the sums of pti within fid groups > sum_pinc <-aggregate(cbind(fid,pti),list(fid),FUN=sum) > > compute mean finc within fid groups > tot_finc <- aggregate(cbind(fid,finc),list(fid),FUN=mean) > > when I try to do it this way I get an error message telling me that enough > memory cannot be allocated ( I am using R 2.7.1 on Windows XP with 2 GB of > Memory). I figure that there must be a more efficent way to go about doing > this. Please suggest. > > I would typically do this kind of task in a database and use SQL to push the > data around. I know RODBC allows you to write SQL to query external DBs. Is > there any mechanisim that allows you to write SQL queies against datasets > internal to R e.g. in the case above > > > I could do something like > > set <- cbind(fid,pti,finc) > > select fid, sum(pti) > from set > group by fid > > that would be handy! > > Thanks, > > Spencer > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data summarization etc...
Hello, I am trying to do some fairly straightforward data summarization, i.e., the kind you would do with a pivot table in excel or by using SQL queires. I have a moderately sized data set of ~70,000 records and I am trying to compute some group averages and sum values within groups. the code example below shows how I am trying to go about doing this pti <-rnorm(7,10) fid <- rnorm(7,100) finc <- rnorm(7,1000) ### compute the sums of pti within fid groups sum_pinc <-aggregate(cbind(fid,pti),list(fid),FUN=sum) compute mean finc within fid groups tot_finc <- aggregate(cbind(fid,finc),list(fid),FUN=mean) when I try to do it this way I get an error message telling me that enough memory cannot be allocated ( I am using R 2.7.1 on Windows XP with 2 GB of Memory). I figure that there must be a more efficent way to go about doing this. Please suggest. I would typically do this kind of task in a database and use SQL to push the data around. I know RODBC allows you to write SQL to query external DBs. Is there any mechanisim that allows you to write SQL queies against datasets internal to R e.g. in the case above I could do something like set <- cbind(fid,pti,finc) select fid, sum(pti) from set group by fid that would be handy! Thanks, Spencer [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data summerization etc...
Hello, I am trying to do some fairly straightforward data summarization, i.e., the kind you would do with a pivot table in excel or by using SQL queires. I have a moderately sized data set of ~70,000 records and I am trying to compute some group averages and sum values within groups. the code example below shows how I am trying to go about doing this pti <-rnorm(7,10) fid <- rnorm(7,100) finc <- rnorm(7,1000) ### compute the sums of pti within fid groups sum_pinc <-aggregate(cbind(fid,pti),list(fid),FUN=sum) compute mean finc within fid groups tot_finc <- aggregate(cbind(fid,finc),list(fid),FUN=mean) when I try to do it this way I get an error message telling me that enough memory cannot be allocated ( I am using R 2.7.1 on Windows XP with 2 GB of Memory). I figure that there must be a more efficent way to go about doing this. Please suggest. I would typically do this kind of task in a database and use SQL to push the data around. I know RODBC allows you to write SQL to query external DBs. Is there any mechanisim that allows you to write SQL queies against datasets internal to R e.g. in the case above I could do something like set <- cbind(fid,pti,finc) select fid, sum(pti) from set group by fid that would be handy! Thanks, Spencer [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data size
I don't think anyone knows. I do not think that there is any absolute numbers. It is dependent upon your machine memory and the type of data that you are working with. If you give us some indication of the amount and type of data that you have someone may be able to comment. --- On Thu, 7/3/08, arghya ganguli <[EMAIL PROTECTED]> wrote: > From: arghya ganguli <[EMAIL PROTECTED]> > Subject: [R] Data size > To: r-help@r-project.org > Received: Thursday, July 3, 2008, 7:41 AM > Can somebody please let me know what is the maximum number > of rows and > columns that R can handle in a datafile? > > Thanks & Regards, > > Arghya Ganguli > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. __ [[elided Yahoo spam]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data size
Can somebody please let me know what is the maximum number of rows and columns that R can handle in a datafile? Thanks & Regards, Arghya Ganguli __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data matrix of all possible response patterns
mat <- outer( 0:9, 0:(1024-1), function(x,y) y %/% (2^x) %% 2 ) On Thu, 26 Jun 2008, Daniel Folkinshteyn wrote: this is probably a cludge, and there may be a "neater" way to do this, but... here's one: a = 0:1 for (i in 1:9){ a= merge(unname(a), 0:1) } a = t(a) after the for loop, 'a' will contain a 1024 row by 10 col dataframe. putting it through a transpose, gives you the 10 rows by 1024 cols matrix. on 06/26/2008 02:18 PM SARAH A DEPAOLI said the following: I am looking for a way to generate a data matrix that contains all possible response patterns for 10 binary items. This should produce a matrix with 10 rows (representing 10 items) and 1024 columns (representing 2^10 possible response patterns). Does anyone know of code that would produce such a matrix? Thanks! Sarah Depaoli __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data matrix of all possible response patterns
Try this also: t(expand.grid(rep(list(0:1), 10))) On Thu, Jun 26, 2008 at 3:18 PM, SARAH A DEPAOLI <[EMAIL PROTECTED]> wrote: > I am looking for a way to generate a data matrix that contains all possible > response patterns for 10 binary items. This should produce a matrix with 10 > rows (representing 10 items) and 1024 columns (representing 2^10 possible > response patterns). Does anyone know of code that would produce such a > matrix? > > Thanks! > Sarah Depaoli > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.