Re: [R] Performing operations only on selected data
Thank you, this works very well. My only remaining question about this is about how ifelse is working; I understand the basic syntax (df$condition2 gets assigned the value *runif(nrow(df1[df1$condition1<=1,]),0,1)* or the value *df$condition1* depending on whether or not df$condition1 meets the criterion "<=1". As I understand it, "runif(nrow(df1[df1$condition1<=1,]),0,1)" is a vector of random values with vector length equal to the number of rows meeting "df$condition1<=1" and df$condition1 is just my column of condition1 values. So the command seems to be going down row by row and assigning condition2 values from one of two vectors in an "interleaved" way. So my question is, how does R keep track of which item in each of the vectors to assign to condition2? For example, if the first 4 entries of condition1 are 1, 3, 4, 1, how does R know to use the *first* entry of vector runif(nrow(df1[df1$condition1<=1,]),0,1) then the *second* and *third* values of vector df$condition1, then the *second* value of vector runif(nrow(df1[df1$condition1<=1,]),0,1)? -- View this message in context: http://r.789695.n4.nabble.com/Performing-operations-only-on-selected-data-tp4650646p4650803.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Performing operations only on selected data
I spent some time on this simple question, also searched the forum, eventually hacked my way to an ugly solution for my particular problem but I would like to improve my coding: I have data of the form: df <- expand.grid(group=c('copper', 'zinc', 'aluminum', 'nickel'), condition1=c(1:4)) I would like to add a new data column "condition2", with values equal to the value of condition1 plus a random number from 0-1 (uniform distribution) if the value of condition1 is < 1, or just condition1 if the value of condition1 is > 1. More generally, my interest is in manipulating the values of condition1 if they meet one or more criteria, or keeping the values the same otherwise. Thanks for any thoughts! -- View this message in context: http://r.789695.n4.nabble.com/Performing-operations-only-on-selected-data-tp4650646.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating number of elapsed days from starting date
Hi I have data for events in rows, with columns for person and date. Each person may have more than one event; tC <- textConnection(" Person date bob 1/1/00 bob 1/2/00 bob 1/3/00 dave1/7/00 dave1/8/00 dave1/10/00 kevin 1/2/00 kevin 1/3/00 kevin 1/4/00 ") data <- read.table(header=TRUE, tC) close.connection(tC) rm(tC) I would like to add a new column to my dataframe containing the calculated number of elapsed days from the starting date for each person. So the new dataframe would read Person dateDays bob 1/1/00 0 bob 1/2/00 1 bob 1/3/00 2 dave1/7/00 0 dave1/8/00 1 dave1/10/00 3 kevin 1/2/00 0 kevin 1/3/00 1 kevin 1/4/00 2 Not sure how to do this, tried looking through the forum but didn't find anything that seemed to apply. Suggestions appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Calculating-number-of-elapsed-days-from-starting-date-tp4644333.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop for multiple plots in figure
Well at this point I have what I need (rough plot for data exploration) but the simplicity of the first approach is quite elegant and it has become a learning project. I have succeeded in formatting the overall plot OK but have not been able to solve the problem of titles or any kind of label/legend for the subplots. It seems that the title is called for each datapoint, and then printed one below the other in the plot. Is there any way at all to get a specific legend/title/text on each subplot? Marcel -- View this message in context: http://r.789695.n4.nabble.com/Loop-for-multiple-plots-in-figure-tp4634390p4634649.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop for multiple plots in figure
This solution works really nicely & I learned much by working through it. However but I am having trouble with subplot formatting; setting main=d$Subject results in the correct title over each plot but repeated multiple times. Also I can't seem to format the axis labels and numbers to reduce the space between them and the plot. Any more thoughts appreciated. revised code: tC <- textConnection(" Subject XvarYvarparam1 param2 bob 9 100 1 100 bob 0 110 1 200 steve 2 250 1 50 bob -5 175 0 35 dave22 260 0 343 bob 3 180 0 74 steve 1 290 1 365 kevin 5 380 1 546 bob 8 185 0 76 dave2 233 0 343 steve -10 230 0 556 dave-10 233 1 400 steve -7 250 1 388 dave3 568 0 555 kevin 10 380 0 57 kevin 4 390 0 50 bob 6 115 1 600 ") data <- read.table(header=TRUE, tC) close.connection(tC) rm(tC) plot_one <- function(d){ with(d, plot(Xvar, Yvar, t="n", tck=0.02, main=d$Subject, xlim=c(-14,14), ylim=c(0,600))) # set limits with(d[d$param1 == 0,], points(Xvar, Yvar, col = 1)) # first line with(d[d$param1 == 1,], points(Xvar, Yvar, col = 2)) # second line } par(mfrow=c(2,2)) plyr::d_ply(data, "Subject", plot_one) -- View this message in context: http://r.789695.n4.nabble.com/Loop-for-multiple-plots-in-figure-tp4634390p4634482.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loop for multiple plots in figure
Hello, I have longitudinal data of the form below from N subjects; I am trying to create figure with N small subplots on a single page, in which each plot is from only one subject, and in each plot there is a separate curve for each value of param1. So in this case, there would be four plots on the page (one each for Bob, Steve, Kevin and Dave), and each plot would have two separate curves (one for param1 = 1 and one for param1 = 0). The main title of the plot should be the subject name. I also need to sort the order of the plots on the page by param2. I can do this with a small number of subjects using manual commands. For a larger number I know that a 'for loop' is called for, but can't figure out how to get each of the subjects to plot separately, could not figure it out from the existing posts. For now I want to do this in the basic environment though I know that lattice could also work (might try that later). Any help appreciated tC <- textConnection(" Subject XvarYvarparam1 param2 bob 9 100 1 100 bob 0 250 1 200 steve 2 454 1 50 bob -5 271 0 35 bob 3 10 0 74 steve 1 500 1 365 kevin 5 490 1 546 bob 8 855 0 76 dave2 233 0 343 steve -10 388 0 556 steve -7 284 1 388 dave3 568 1 555 kevin 4 247 0 57 bob 6 300 1 600 ") data <- read.table(header=TRUE, tC) close.connection(tC) rm(tC) par(mfrow=c(2,2) -- View this message in context: http://r.789695.n4.nabble.com/Loop-for-multiple-plots-in-figure-tp4634390.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] linear regression in a ragged array
Hello, I have a large dataset of the form subj var1 var2 001100200 001120226 001130238 001140245 001150300 002110205 002125209 003101233 003115254 I would like to perform linear regression of var2 on var1 for each subject separately. It seems like I should be able to use the tapply function as you do for simple operations (like finding a mean of var1 for each subject), but I am not sure of the correct syntax for this. Is there a way to do this? Many thanks, Marcel -- View this message in context: http://r.789695.n4.nabble.com/linear-regression-in-a-ragged-array-tp3393033p3393033.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] trouble with histograms
Hi, I have tab-delimited data with an unequal number of entries per column, of the sort: A B C 1 2 2 3 4 1 5 2 2 6 2 5 2 3 6 2 I would like to make a histogram of the frequencies of each represented number in a "stacked" histogram, where you can see the contribution of each group (A, B or C) to the total height of the bar, and each bar labeled with the represented number. So, there would be a bar labeled "1" of height 2, half one color for group A, and half another color for group B. So far, I can get my data into a dataframe >data <- read.table("myfile") I think I first have to use "hist" to get the frequencies of each, and I have figured out how to use breaks to make bins; > bins=seq(0.5,6.5,by=1) >hist(data$A, header=T, sep="\t", breaks=bins) Lots of trouble from then on, though, and I just can't get this into a usable plot. Any help appreciated. Marcel -- View this message in context: http://r.789695.n4.nabble.com/trouble-with-histograms-tp3014838p3014838.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R code output issues
Thanks for the input Adding "print" took care of the first problem. The output looks like what I would expect, so I think the code is doing what I would like it to for the first 44 observations. > print(results.df) DR D.1 R.1 V1V2dif V1V4dif 1 68.92500 75.0 284.5250 296. 6.075 11.475 2 68.81081 67.0 287.7568 283. -1.8108108 -4.7567568 3 65.43902 62.0 282.5366 279. -3.4390244 -3.5365854 4 66.6 67.25000 286.7000 288.2500 0.650 1.550 5 68.94872 71.0 297.8462 305. 2.0512821 7.1538462 Etc.. When I use str(results.df) it does seem to indicate a short file of 44 observations. 'data.frame':44 obs. of 6 variables: $ D : num 68.9 68.8 65.4 66.6 68.9 ... $ R : num 75 67 62 67.2 71 ... $ D.1: num 285 288 283 287 298 ... $ R.1: num 296 283 279 288 305 ... $ V1V2dif: num 6.08 -1.81 -3.44 0.65 2.05 ... $ V1V4dif: num 11.48 -4.76 -3.54 1.55 7.15 ... So I am still left with that question.. -- View this message in context: http://r.789695.n4.nabble.com/R-code-output-issues-tp2526415p2526469.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R code output issues
Hi all, I have a short R code file that I am using to perform calculations on a dataset. I am having a few issues with output: 1. Although my input data file is 2149 lines long, when I type "results.df" from the command line, I get the appropriate calculation results for only the first 46 rows. Same result if I "sink" the output to a file, and type "results.df" at the command line. This creates a file with the first 46 entries. I do get the entire input data file back if I type "data", and I can't see anything in my input file around line 46 that would account for this. 2. If I run the code from a file using the command "source("TransmissionCalc2") with the "results.df" command embedded in the file, there is no output to the terminal at all (or to the output file, if I use sink). Sink just creates an empty file. So, not sure why my results dataframe seems to only include a small fraction of the data, or why the write commands are ignored when embedded in the code and called by "source("etc" CODE rm(list = ls(all = TRUE)) alldata <-read.table("/Users/marcel/Desktop/V1V2TransmAnalysis/3_transmissiondata", header=T) #sink("/Users/marcel/Desktop/V1V2TransmAnalysis/4_output") data <- data.frame(alldata) V1V2means <- with(data, tapply(V1V2, list(Pair, DR), mean)) V1V4means <- with(data, tapply(V1V4, list(Pair, DR), mean)) results.df <- data.frame(V1V2means, V1V4means, V1V2dif = V1V2means[, "R"] - V1V2means[, "D"], V1V4dif = V1V4means[, "R"] - V1V4means[, "D"] ) data SAMPLE OF INPUT DATA FILE PairDRV1V2V1V4 1D63277 1D63277 1D63277 . Thoughts greatly appreciated. Marcel -- View this message in context: http://r.789695.n4.nabble.com/R-code-output-issues-tp2526415p2526415.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] while loop until end of file
Hi Guys, stumped by a simple problem. I would like to take a file of the form Pair group param1 1 D 10 1 D 10 1 R 10 1 D 10 2 D 10 2 D 10 2 D 10 2 R 10 2 R 10 etc.. and for each pair, calculate the average of param1 for group D entries, subtract from the average of param1 for the group R entries, and then write the results (ie, AveParam1D AveParam1R dif) in a tab delimited file. Below is the start of my code. the difficulty i am having is in creating a while loop that stops once there are no more lines to read from the input file. also not sure of the best way to write in the results, though I think I should use rbind. data <- data.frame(alldata) i <- 1 # need appropriate while loop { ss <- subset(data, Pair==i) ssD <- subset(ss, DR==D) ssR <- subset(ss, DR==R) p1 <- mean(ssD$Length) p2 <- mean(ssR$Length) dif <- p1-p2 out <- rbind(data.frame(P1, P2, diff) i <-i + 1 } write.table(out, file="out", quote=F, row.names=F, col.names=T, sep="\t") I have spent an absurd amount of time trying to sort this out with the manual and forum searches. Any suggestions appreciated. Marcel -- View this message in context: http://r.789695.n4.nabble.com/while-loop-until-end-of-file-tp2399544p2399544.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Formatting numerical output
Hello, I am new to R and am having difficulty formatting numerical output from a regression analysis. My code iteratively performs linear regression on a dataset while excluding certain data ranges. My code: rm(list = ls(all = TRUE)) sink("outfile") dat <- read.table("testdat", sep="\t", header=TRUE) int = 0.2 for (x in c(0:20)) { subdat <- subset(dat, time <= int * x | time > (int*x) + int) #excludes range of time data between int * x and (int*x) + int lm.subdat <- lm(length~time, subdat) #regression rs.subdat <- summary(lm.subdat)$r.squared #getting R-squared information txt1 <- ("Excluded range: Time") #creating components of output message txt2 <- ("R^2 =") #creating components of output message lowend <- (int*x) highend <- (int*x + int) output <- c(txt1, lowend, highend, txt2, rs.subdat) print.noquote(output, sep="\t") } sink() Currently my output looks like: [1] Excluded range: Time 00.2 [4] R^2 =0.111526872884505 [1] Excluded range: Time 0.2 0.4 [4] R^2 =0.0706332920267015 [1] Excluded range: Time 0.4 0.6 [4] R^2 =0.0691466100802879 I would like the output format to look like: Excluded range: Time 1.0 - 1.2R^2 = 0.45 Excluded range: Time 1.2 - 1.4R^2 = 0.5 etc. I would like to 1. get time and R^2 data on the same line 2. control (reduce) the number of digits reported for R^2 3. reduce the large number of empty spaces between "R^2' and value. I searched a lot but could not find much on this. Any help on these specifics or general comments on formatting numerical output greatly appreciated. thanks, Marcel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.