Re: [R] merging data list in to single data frame
Thank you Hadley. With your solution, now it feels very easy ! _ From: h.wick...@gmail.com [mailto:h.wick...@gmail.com] On Behalf Of Hadley Wickham Sent: Monday, April 04, 2011 6:11 PM To: Umesh Rosyara Cc: Dennis Murphy; r-help@r-project.org; rosyar...@gmail.com Subject: Re: [R] merging data list in to single data frame > filelist = list.files(pattern = "K*cd.txt") # the file names are K1cd.txt > .to K200cd.txt It's very easy: names(filelist) <- basename(filelist) data_list <- ldply(filelist, read.table, header=T, comment=";", fill=T) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ _ No virus found in this message. Checked by AVG - www.avg.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging data list in to single data frame
Thank you Dennis for the solution. It is a step ahead..However I need to read all 200 files as dataframes one-by-one. Can we automate this process. I used the following step to read all file at once however the data_list ended as list. filelist = list.files(pattern = "K*cd.txt") # the file names are K1cd.txt .to K200cd.txt data_list <-lapply(filelist, read.table, header=T, comment=";", fill=T) names(filelist) <- 1:length(filelist) library("plyr") ldply(data_list, rbind) I tried to use your approach to list, is not successful to have the var .id (otherwise it is binding the component dataframes !), probably this is applicable to component data frames not list with many data frames. Do you any suggestion on using fuctions that can read the files (as I did above) and save as new dataframe (for example DF1.DF2) not a list of 200 data frames? If we can do that then we will able to use this approach. Thank you so much, Umesh R From: Dennis Murphy [mailto:djmu...@gmail.com] Sent: Monday, April 04, 2011 3:25 PM To: Umesh Rosyara Cc: r-help@r-project.org; rosyar...@gmail.com Subject: Re: [R] merging data list in to single data frame Hi: Here's an alternative using ldply() from the plyr package. The idea is to read the data frames into a list, name them accordingly and then call ldply(). # Read in the test data frames (you want to use list.files() instead to input the data per Uwe's guidelines) df1 <- read.table(textConnection(" + var1 var2 var3var4 + 1 6 0.3 8 + 3 4 0.4 9 + 2 3 0.4 6 + 1 0.40.9 3"), header = TRUE) > df2 <- read.table(textConnection(" + var1 var2 var3 var4 + 1 16 0.6 7 + 3 14 0.4 6 + 2 13 0.4 5 + 1 0.6 0.9 2"), header = TRUE) closeAllConnections() # generate the list dl <- list(df1, df2) # Name the list components by number and then call ldply(): names(dl) <- 1:2 # more generally, names(dl) <- 1:length(dl) library("plyr") ldply(dl, rbind) .id var1 var2 var3 var4 1 11 6.0 0.38 2 13 4.0 0.49 3 12 3.0 0.46 4 11 0.4 0.93 5 21 16.0 0.67 6 23 14.0 0.46 7 22 13.0 0.45 8 21 0.6 0.92 You can always change .id to fileno afterwards. HTH, Dennis On Mon, Apr 4, 2011 at 7:41 AM, Umesh Rosyara wrote: Dear R community members I did find a good way to merge my 200 text data files in to a single data file with one column added will show indicator for that file. filelist = list.files(pattern = "K*cd.txt") # the file names are K1cd.txt .to K200cd.txt data_list <-lapply(filelist, read.table, header=T, comment=";", fill=T) This will create list, but this is not what I want. I want a single dataframe (all separate dataframes have same variable headings) with additional row for example ; just for example, two small datasets are created by my component datasets are huge, need automation ;read from file K1cd.txt var1 var2 var3var4 1 6 0.3 8 3 4 0.4 9 2 3 0.4 6 1 0.40.9 3 ;read from file K2cd.txt var1 var2 var3var4 1 16 0.67 3 14 0.4 6 2 1 3 0.4 5 1 0.60.9 2 the output dataframe should look like Fileno var1 var2 var3var4 1 1 6 0.38 1 3 4 0.4 9 1 2 3 0.4 6 1 1 0.4 0.93 2 1 16 0.67 2 3 14 0.46 2 2 1 3 0.45 2 1 0.6 0.9 2 Please note that new file no column is added Thank you for the help. Umesh R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Questions remaining: define any character as na.string RE: merging data list in to single data frame
Dear Uwe and R community members Thank you Uwe for the help. I have still a question remaining, I am trying to find answer from long time. While exporting my data, I have some characters mixed into it. I want to define any characters as na.string? Is it possible to do so? Thanks; Umesh -Original Message- From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] Sent: Monday, April 04, 2011 12:22 PM To: Umesh Rosyara Cc: r-help@r-project.org; rosyar...@gmail.com Subject: Re: [R] merging data list in to single data frame On 04.04.2011 16:41, Umesh Rosyara wrote: > Dear R community members > > > > I did find a good way to merge my 200 text data files in to a single data > file with one column added will show indicator for that file. > > > > filelist = list.files(pattern = "K*cd.txt") I doubt you meant "K*cd.txt" but "^K[[:digit:]]*cd\\.txt$". # the file names are K1cd.txt > .to K200cd.txt > > data_list<-lapply(filelist, read.table, header=T, comment=";", fill=T) Replace by: data_list <- lapply(filelist, function(x) cbind(Filename = x, read.table(x, header=T, comment=";", fill=TRUE)) And then: result <- do.call("rbind", data_list) Uwe Ligges > > > > This will create list, but this is not what I want. > > > > I want a single dataframe (all separate dataframes have same variable > headings) with additional row for example > > > > ; just for example, two small datasets are created by my component datasets > are huge, need automation > > ;read from file K1cd.txt > > var1 var2 var3var4 > > 1 6 0.3 8 > > 3 4 0.4 9 > > 2 3 0.4 6 > > 1 0.40.9 3 > > > > ;read from file K2cd.txt > > var1 var2 var3var4 > > 1 16 0.67 > > 3 14 0.4 6 > > 2 1 3 0.4 5 > > 1 0.60.9 2 > > > > the output dataframe should look like > > > > Fileno var1 var2 var3var4 > > 1 1 6 0.38 > > 1 3 4 0.4 9 > > 1 2 3 0.4 6 > > 1 1 0.4 0.93 > > 2 1 16 0.67 > > 2 3 14 0.46 > > 2 2 1 3 0.45 > > 2 1 0.6 0.9 2 > > > > Please note that new file no column is added > > > > Thank you for the help. > > > > Umesh R > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merging data list in to single data frame
Dear R community members I did find a good way to merge my 200 text data files in to a single data file with one column added will show indicator for that file. filelist = list.files(pattern = "K*cd.txt") # the file names are K1cd.txt .to K200cd.txt data_list <-lapply(filelist, read.table, header=T, comment=";", fill=T) This will create list, but this is not what I want. I want a single dataframe (all separate dataframes have same variable headings) with additional row for example ; just for example, two small datasets are created by my component datasets are huge, need automation ;read from file K1cd.txt var1 var2 var3var4 1 6 0.3 8 3 4 0.4 9 2 3 0.4 6 1 0.40.9 3 ;read from file K2cd.txt var1 var2 var3var4 1 16 0.67 3 14 0.4 6 2 1 3 0.4 5 1 0.60.9 2 the output dataframe should look like Fileno var1 var2 var3var4 1 1 6 0.38 1 3 4 0.4 9 1 2 3 0.4 6 1 1 0.4 0.93 2 1 16 0.67 2 3 14 0.46 2 2 1 3 0.45 2 1 0.6 0.9 2 Please note that new file no column is added Thank you for the help. Umesh R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help need on working in subset within a dataframe
Thank you, Ista. It helps. Best Regards Umesh R _ From: istaz...@gmail.com [mailto:istaz...@gmail.com] On Behalf Of Ista Zahn Sent: Tuesday, March 22, 2011 8:58 AM To: Umesh Rosyara Cc: R mailing list Subject: Re: [R] help need on working in subset within a dataframe Hi Umesh, I use the plyr package for this sort of thing: library(plyr) daply(dataframe, .(ped), myfun) Best, Ista On Tue, Mar 22, 2011 at 3:48 AM, Umesh Rosyara wrote: > Dear R-experts > > Execuse me for an easy question, but I need help, sorry for that. > > >From days I have been working with a large dataset, where operations are > needed within a component of dataset. Here is my question: > > I have big dataset where x1:.x1000 or so. What I need to do is to work > on 4 consequite variables to calculate a statistics and output. So far so > good. There are more vector operations inside function to do this. My > question this time is I want to do this seperately for each level of factor > (infollowing example it is Ped, thus if there are 20 ped, I want a output > with 20 statistics, so that I can work further on them). > > #data generation > ped <- c(1,1,1,1,1, 1,1,1,1,1, 2,2,2,2,2, 2,2,2,2,2)# I have 20 ped > fd <- c(1,1,1,1,1, 2,2,2,2,2, 3,3,3,3,3, 4,4,4,4,4) # I have ~100 fd > iid <- c(1:20) # number can go up to 2000 > mid <- c(0,0,1,1,1, 0,0,6,6,6, 0,0, 11,11,11, 0,0,16,16,16) > fid <- c(0,0,2,2,2, 0,0,7,7,7, 0,0, 12,12,12, 0,0,17, 17, 17) > y <- c(3,4,5,6,7, 3,4,8,9, 8, 2,3,3,6,7, 9,12,10,8,12) > x1 <- c(1,1,1,0,0, 1,0,1,1,0, 0, 1,1,0,1,1, 1,1,0,0) > x2 <- c(1,1,1,0,0, 1,0,1,1,0, 0, 1,1, 1,0, 1,1,0,1,0) > x3 <- c(1,0,0,1,1, 1,1,1,1,1, 1, 1,1, 1,0, 1,1,0,1,0) > x4 <- c(1,1,1,1,0, 0,1,1, 0,0, 0, 1,0,0, 0, 0,0,1, 1,1) > # I have more X variables potentially >1000 but I need to work four at a > time > dataframe <- data.frame(ped, fd, iid, mid, fid, y, x1, x2, x3, x4) > > myfun <- function(dataframe) { > namemat <- matrix(c(1:4), nrow = 1) > smyfun <- function(x) { > x <- as.vector(x) > K1 <- dataframe$x1 * 0.23 > K2 <- dataframe$x2 * 0.98 > # just example there is long vector calculations in read dataset > kt1 <- K1 * K2 > kt2 <- K1 / K2 > Qni <- (K1*(kt1-0.25)+ K2 *(kt2-0.25)) > y <- dataframe$y > yg <- mean(y, na.rm= TRUE) # mean of trait Y # mean of trait Y > dvm <- (y-yg ) # deviation of phenotypic value from mean > sumdvm <-abs(sum(dvm, na.rm= TRUE)) > yQni <- y* Qni > sumyQni <-abs(sum(yQni, na.rm= TRUE)) > npt = ( sumdvm/ sumyQni) > return(npt) > } > npt1 <- apply(namemat,1, smyfun) > return(npt1) > } > > myfun (dataframe) > > My question is how can I automate the process so that the above function can > calculate different values for n levels (>20 in my real data) of factor ped. > > > Thanks in advance for the help. R-community is always helpful. > > Umesh R > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org _ No virus found in this message. Checked by AVG - www.avg.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help need on working in subset within a dataframe
Dear R-experts Execuse me for an easy question, but I need help, sorry for that. >From days I have been working with a large dataset, where operations are needed within a component of dataset. Here is my question: I have big dataset where x1:.x1000 or so. What I need to do is to work on 4 consequite variables to calculate a statistics and output. So far so good. There are more vector operations inside function to do this. My question this time is I want to do this seperately for each level of factor (infollowing example it is Ped, thus if there are 20 ped, I want a output with 20 statistics, so that I can work further on them). #data generation ped <- c(1,1,1,1,1, 1,1,1,1,1, 2,2,2,2,2, 2,2,2,2,2)# I have 20 ped fd <- c(1,1,1,1,1, 2,2,2,2,2, 3,3,3,3,3, 4,4,4,4,4) # I have ~100 fd iid <- c(1:20) # number can go up to 2000 mid <- c(0,0,1,1,1, 0,0,6,6,6, 0,0, 11,11,11, 0,0,16,16,16) fid <- c(0,0,2,2,2, 0,0,7,7,7, 0,0, 12,12,12, 0,0,17, 17, 17) y <- c(3,4,5,6,7, 3,4,8,9, 8, 2,3,3,6,7, 9,12,10,8,12) x1 <- c(1,1,1,0,0, 1,0,1,1,0, 0, 1,1,0,1,1, 1,1,0,0) x2 <- c(1,1,1,0,0, 1,0,1,1,0, 0, 1,1, 1,0, 1,1,0,1,0) x3 <- c(1,0,0,1,1, 1,1,1,1,1, 1, 1,1, 1,0, 1,1,0,1,0) x4 <- c(1,1,1,1,0, 0,1,1, 0,0, 0, 1,0,0, 0, 0,0,1, 1,1) # I have more X variables potentially >1000 but I need to work four at a time dataframe <- data.frame(ped, fd, iid, mid, fid, y, x1, x2, x3, x4) myfun <- function(dataframe) { namemat <- matrix(c(1:4), nrow = 1) smyfun <- function(x) { x <- as.vector(x) K1 <- dataframe$x1 * 0.23 K2 <- dataframe$x2 * 0.98 # just example there is long vector calculations in read dataset kt1 <- K1 * K2 kt2 <- K1 / K2 Qni <- (K1*(kt1-0.25)+ K2 *(kt2-0.25)) y <- dataframe$y yg <- mean(y, na.rm= TRUE) # mean of trait Y # mean of trait Y dvm <- (y-yg ) # deviation of phenotypic value from mean sumdvm <-abs(sum(dvm, na.rm= TRUE)) yQni <- y* Qni sumyQni <-abs(sum(yQni, na.rm= TRUE)) npt = ( sumdvm/ sumyQni) return(npt) } npt1 <- apply(namemat,1, smyfun) return(npt1) } myfun (dataframe) My question is how can I automate the process so that the above function can calculate different values for n levels (>20 in my real data) of factor ped. Thanks in advance for the help. R-community is always helpful. Umesh R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] still a problem remainingRE: Data lebals xylattice plot: RE: displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function
Thank you for helping me and this solved the problem Best Regards Umesh R _ From: foolish.andr...@gmail.com [mailto:foolish.andr...@gmail.com] On Behalf Of Felix Andrews Sent: Friday, March 11, 2011 4:05 AM To: Umesh Rosyara Cc: R mailing list; deepayan.sar...@r-project.org Subject: Re: [R] still a problem remainingRE: Data lebals xylattice plot: RE: displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function Yes, it is intersect rather than intersection, sorry. And in panel.text() the x and y were switched, so just reverse the first two arguments. Thats what comes from posting from an iGizmo with no R to test my code. 2011/3/11 Umesh Rosyara : > Thank you so much for the advice. The R could not find function > "intersection". Do I need additional package to have this function active. I > tried "intersect" instead has no effect. > > xyplot(p ~ xvar|chr, data=dataf, > panel=function(x, y, subscripts){ > panel.xyplot(x, y) > ok= intersection(subscripts, which(dataf$p < 0.05)) > with(dataf[ok,], panel.text(p, xvar, name)) > }, as.table=T, subscripts=T) > > > Best Regards > > Umesh R > > > > > > From: foolish.andr...@gmail.com [mailto:foolish.andr...@gmail.com] On Behalf > Of Felix Andrews > Sent: Thursday, March 10, 2011 7:01 AM > To: Umesh Rosyara > Cc: R mailing list; deepayan.sar...@r-project.org > Subject: Re: [R] still a problem remainingRE: Data lebals xylattice plot: > RE: displaying label meeting condition (i.e. significant, i..e p value less > than 005) in plot function > > Notice that pvals is a subset of dataf so 'subscripts' can not be > applied directly to pvals. Instead you should do the subsetting inside > the panel function. e.g. > ok <- intersection(subscripts, which(dataf$p < 0.05)) > with(dataf[ok,], panel.text(p, xval, name)) > > > By the way you should include the dots (...) in your panel function > arguments and pass them on to panel.xyplot. > > > On Thursday, 10 March 2011, Umesh Rosyara wrote: >> Lattice-experts: >> Thank you for those who have responded earlier. I have not got a perfect >> solution yet but tried several ways, unless anybody really lattice killer >> steps up, I will leave it and see alternatives. Sorry to send it again. >> >> >> >> #Data >> >> name <- c(paste ("M", 1:1000, sep = "")) >> xvar <- seq(1, 1, 10) >> chr <- c(rep(1,200),rep(2,200), rep(3,200), rep(4,200), rep(5,200)) >> set.seed(134) >> p <- rnorm(1000, 0.15,0.05) >> dataf <- data.frame(name,xvar, chr, p) >> dataf$chr <- as.factor(dataf$chr) >> >> >> >> #subset data >> >> pvals <- dataf[dataf$p < 0.05,] >> >> >> >> # unsuccessful commands >> >> xyplot(p ~ xvar|chr, data=dataf, >> panel=function(x, y, subscripts){ >> >> panel.xyplot(x, y) >> >>panel.xyplot(pvals$xvar[subscripts],pvals$p[subscripts], pch=6) >> panel.abline(h=0.01, col="red") >> >> >> panel.text(pvals$xvar[subscripts], pvals$p[subscripts], >> pvals$name[subscripts], col="green2") >> >> >> }, as.table=T, subscripts=T) >> >> >> >> >> >> Best Regards >> >> Umesh R >> >> >> >> >> _ >> >> From: Bert Gunter [mailto:gunter.ber...@gene.com] >> Sent: Tuesday, March 08, 2011 12:00 AM >> To: Umesh Rosyara >> Cc: Jorge Ivan Velez; Dennis Murphy; sarah.gos...@gmail.com; R mailing >> list >> Subject: Re: still a problem remainingRE: [R] Data lebals xylattice plot: >> RE: displaying label meeting condition (i.e. significant, i..e p value >> less >> than 005) in plot function >> >> >> >> As I believe I already told you in my original reply, you have to make >> use of the subscripts argument in the panel function to subscript the >> P values etc. vector to be plotted in each panel. Something like: >> (untested) >> >>panel = function(x, y,subscripts,...) { >> panel.xyplot(x, y,...) >> panel.abline(h=0.01, col="red") >> panel.text(xv1[subscripts], p1[subscripts], >> n1[subscripts], col="green2") >>} >> >> >> Also,in future, please send plain text email, as requested in the >> guide. Your message was in an annoying blue font in my gmail reader. >> >> Cheers, >> Bert >> >> >> On Mon, Mar 7, 2011 at 5:26 PM, Umesh Rosya
Re: [R] still a problem remainingRE: Data lebals xylattice plot: RE: displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function
Thank you so much for the advice. The R could not find function "intersection". Do I need additional package to have this function active. I tried "intersect" instead has no effect. xyplot(p ~ xvar|chr, data=dataf, panel=function(x, y, subscripts){ panel.xyplot(x, y) ok= intersection(subscripts, which(dataf$p < 0.05)) with(dataf[ok,], panel.text(p, xvar, name)) }, as.table=T, subscripts=T) Best Regards Umesh R _ From: foolish.andr...@gmail.com [mailto:foolish.andr...@gmail.com] On Behalf Of Felix Andrews Sent: Thursday, March 10, 2011 7:01 AM To: Umesh Rosyara Cc: R mailing list; deepayan.sar...@r-project.org Subject: Re: [R] still a problem remainingRE: Data lebals xylattice plot: RE: displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function Notice that pvals is a subset of dataf so 'subscripts' can not be applied directly to pvals. Instead you should do the subsetting inside the panel function. e.g. ok <- intersection(subscripts, which(dataf$p < 0.05)) with(dataf[ok,], panel.text(p, xval, name)) By the way you should include the dots (...) in your panel function arguments and pass them on to panel.xyplot. On Thursday, 10 March 2011, Umesh Rosyara wrote: > Lattice-experts: > Thank you for those who have responded earlier. I have not got a perfect > solution yet but tried several ways, unless anybody really lattice killer > steps up, I will leave it and see alternatives. Sorry to send it again. > > > > #Data > > name <- c(paste ("M", 1:1000, sep = "")) > xvar <- seq(1, 1, 10) > chr <- c(rep(1,200),rep(2,200), rep(3,200), rep(4,200), rep(5,200)) > set.seed(134) > p <- rnorm(1000, 0.15,0.05) > dataf <- data.frame(name,xvar, chr, p) > dataf$chr <- as.factor(dataf$chr) > > > > #subset data > > pvals <- dataf[dataf$p < 0.05,] > > > > # unsuccessful commands > > xyplot(p ~ xvar|chr, data=dataf, > panel=function(x, y, subscripts){ > > panel.xyplot(x, y) > >panel.xyplot(pvals$xvar[subscripts],pvals$p[subscripts], pch=6) > panel.abline(h=0.01, col="red") > > > panel.text(pvals$xvar[subscripts], pvals$p[subscripts], > pvals$name[subscripts], col="green2") > > > }, as.table=T, subscripts=T) > > > > > > Best Regards > > Umesh R > > > > > _ > > From: Bert Gunter [mailto:gunter.ber...@gene.com] > Sent: Tuesday, March 08, 2011 12:00 AM > To: Umesh Rosyara > Cc: Jorge Ivan Velez; Dennis Murphy; sarah.gos...@gmail.com; R mailing list > Subject: Re: still a problem remainingRE: [R] Data lebals xylattice plot: > RE: displaying label meeting condition (i.e. significant, i..e p value less > than 005) in plot function > > > > As I believe I already told you in my original reply, you have to make > use of the subscripts argument in the panel function to subscript the > P values etc. vector to be plotted in each panel. Something like: > (untested) > >panel = function(x, y,subscripts,...) { > panel.xyplot(x, y,...) > panel.abline(h=0.01, col="red") > panel.text(xv1[subscripts], p1[subscripts], > n1[subscripts], col="green2") >} > > > Also,in future, please send plain text email, as requested in the > guide. Your message was in an annoying blue font in my gmail reader. > > Cheers, > Bert > > > On Mon, Mar 7, 2011 at 5:26 PM, Umesh Rosyara wrote: >> Hi Lattice Users >> >> I have been working to fix this problem, still I am not able to solve > fully. >> I could label those names that have pvalue less than 0.01 but still the >> label appears in all compoent plots eventhough those who do have the > pvalue >> ! How can I implement it successuflly to grouped data like mine. You help > is >> highly appreciated. >> >> #my data >> name <- c(paste ("M", 1:1000, sep = "")) >> xvar <- seq(1, 1, 10) >> chr <- c(rep(1,200),rep(2,200), rep(3,200), rep(4,200), rep(5,200)) >> set.seed(134) >> p <- rnorm(1000, 0.15,0.05) >> dataf <- data.frame(name,xvar, chr, p) >> dataf$chr <- as.factor(dataf$chr) >> >> # lattice plot: As far as I can go now ! little progress but final push >> required ! >> require(lattice) >> pvals <- dataf[dataf$p < 0.01,] >> p1 <- pvals$p >> n1 <- pvals$name >> xv1 <- pvals$xvar >> xyplot(p ~ xvar|chr, data=dataf, >>panel = function(x, y) { >>panel.xyplot(x, y) >>panel.abline(h=0.01, col="red") >>panel
Re: [R] still a problem remainingRE: Data lebals xylattice plot: RE: displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function
Lattice-experts: Thank you for those who have responded earlier. I have not got a perfect solution yet but tried several ways, unless anybody really lattice killer steps up, I will leave it and see alternatives. Sorry to send it again. #Data name <- c(paste ("M", 1:1000, sep = "")) xvar <- seq(1, 1, 10) chr <- c(rep(1,200),rep(2,200), rep(3,200), rep(4,200), rep(5,200)) set.seed(134) p <- rnorm(1000, 0.15,0.05) dataf <- data.frame(name,xvar, chr, p) dataf$chr <- as.factor(dataf$chr) #subset data pvals <- dataf[dataf$p < 0.05,] # unsuccessful commands xyplot(p ~ xvar|chr, data=dataf, panel=function(x, y, subscripts){ panel.xyplot(x, y) panel.xyplot(pvals$xvar[subscripts],pvals$p[subscripts], pch=6) panel.abline(h=0.01, col="red") panel.text(pvals$xvar[subscripts], pvals$p[subscripts], pvals$name[subscripts], col="green2") }, as.table=T, subscripts=T) Best Regards Umesh R _ From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Tuesday, March 08, 2011 12:00 AM To: Umesh Rosyara Cc: Jorge Ivan Velez; Dennis Murphy; sarah.gos...@gmail.com; R mailing list Subject: Re: still a problem remainingRE: [R] Data lebals xylattice plot: RE: displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function As I believe I already told you in my original reply, you have to make use of the subscripts argument in the panel function to subscript the P values etc. vector to be plotted in each panel. Something like: (untested) panel = function(x, y,subscripts,...) { panel.xyplot(x, y,...) panel.abline(h=0.01, col="red") panel.text(xv1[subscripts], p1[subscripts], n1[subscripts], col="green2") } Also,in future, please send plain text email, as requested in the guide. Your message was in an annoying blue font in my gmail reader. Cheers, Bert On Mon, Mar 7, 2011 at 5:26 PM, Umesh Rosyara wrote: > Hi Lattice Users > > I have been working to fix this problem, still I am not able to solve fully. > I could label those names that have pvalue less than 0.01 but still the > label appears in all compoent plots eventhough those who do have the pvalue > ! How can I implement it successuflly to grouped data like mine. You help is > highly appreciated. > > #my data > name <- c(paste ("M", 1:1000, sep = "")) > xvar <- seq(1, 1, 10) > chr <- c(rep(1,200),rep(2,200), rep(3,200), rep(4,200), rep(5,200)) > set.seed(134) > p <- rnorm(1000, 0.15,0.05) > dataf <- data.frame(name,xvar, chr, p) > dataf$chr <- as.factor(dataf$chr) > > # lattice plot: As far as I can go now ! little progress but final push > required ! > require(lattice) > pvals <- dataf[dataf$p < 0.01,] > p1 <- pvals$p > n1 <- pvals$name > xv1 <- pvals$xvar > xyplot(p ~ xvar|chr, data=dataf, >panel = function(x, y) { >panel.xyplot(x, y) >panel.abline(h=0.01, col="red") >panel.text(xv1, p1, n1, col="green2") >}) > > Thank you in advance. > > Best Regards > > Umesh R > > > > > From: Bert Gunter [mailto:gunter.ber...@gene.com] > Sent: Sunday, March 06, 2011 10:50 AM > To: Umesh Rosyara > Cc: Jorge Ivan Velez; Dennis Murphy; sarah.gos...@gmail.com; R mailing list > Subject: Re: [R] Data lebals xylattice plot: RE: displaying label meeting > condition (i.e. significant, i..e p value less than 005) in plot function > > This is easy to do by specifying xyplot's panel function. Assuming > only one panel -- otherwise you need to pass the subscripts arguments > to choose the values belonging to the panel -- somethings like: > > xyplot(y~x, pvals = pvals,..., ## pvals is your vector of small p > values with e.g. NA's elsewhere > panel = function(x,y, pvals,...) { > panel.xyplot(...) > panel.text((x,y, pvals,...) > } ) > > This is obviously just a sketch and will not work as written. So > please read the Help page on xyplot carefully and perhaps also > Deepayan's book on trellis graphics -- there are also undoubtedly > online resources: search on "trellis graphics tutorial" or some such. > This is not hard, but there are some details that you will need to > master,especially regarding argument passing. > > Another alternative is to use the layer() function in the latticeExtra > package instead. Consult the documentation there for details. > > Cheers, > Bert > > > > On Sun, Mar 6, 2011 at 5:17 AM, Umesh Rosyara wrote: >> Dear Jorge, Dennis, Sarah and R-experts. >> >> Thank for helping me. As you mentioned it i
[R] still a problem remainingRE: Data lebals xylattice plot: RE: displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function
Hi Lattice Users I have been working to fix this problem, still I am not able to solve fully. I could label those names that have pvalue less than 0.01 but still the label appears in all compoent plots eventhough those who do have the pvalue ! How can I implement it successuflly to grouped data like mine. You help is highly appreciated. #my data name <- c(paste ("M", 1:1000, sep = "")) xvar <- seq(1, 1, 10) chr <- c(rep(1,200),rep(2,200), rep(3,200), rep(4,200), rep(5,200)) set.seed(134) p <- rnorm(1000, 0.15,0.05) dataf <- data.frame(name,xvar, chr, p) dataf$chr <- as.factor(dataf$chr) # lattice plot: As far as I can go now ! little progress but final push required ! require(lattice) pvals <- dataf[dataf$p < 0.01,] p1 <- pvals$p n1 <- pvals$name xv1 <- pvals$xvar xyplot(p ~ xvar|chr, data=dataf, panel = function(x, y) { panel.xyplot(x, y) panel.abline(h=0.01, col="red") panel.text(xv1, p1, n1, col="green2") }) Thank you in advance. Best Regards Umesh R _ From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Sunday, March 06, 2011 10:50 AM To: Umesh Rosyara Cc: Jorge Ivan Velez; Dennis Murphy; sarah.gos...@gmail.com; R mailing list Subject: Re: [R] Data lebals xylattice plot: RE: displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function This is easy to do by specifying xyplot's panel function. Assuming only one panel -- otherwise you need to pass the subscripts arguments to choose the values belonging to the panel -- somethings like: xyplot(y~x, pvals = pvals,..., ## pvals is your vector of small p values with e.g. NA's elsewhere panel = function(x,y, pvals,...) { panel.xyplot(...) panel.text((x,y, pvals,...) } ) This is obviously just a sketch and will not work as written. So please read the Help page on xyplot carefully and perhaps also Deepayan's book on trellis graphics -- there are also undoubtedly online resources: search on "trellis graphics tutorial" or some such. This is not hard, but there are some details that you will need to master,especially regarding argument passing. Another alternative is to use the layer() function in the latticeExtra package instead. Consult the documentation there for details. Cheers, Bert On Sun, Mar 6, 2011 at 5:17 AM, Umesh Rosyara wrote: > Dear Jorge, Dennis, Sarah and R-experts. > > Thank for helping me. As you mentioned it is difficult apply in lattice in > this situation. > > Unless, there is a possibility, I would try to use lattice. The major reason > toward this is- my ultimate solution might be better of in lattice as I have > a classificatory variable to make similar graph for each caterogory in the > lattice graph. Lattice cleates nice stacked xyplots. > > p ~ xvar | chr # require plots by the factor variable "chr" > > # with a classificatory variable > name <- c(paste ("M", 1:1000, sep = "")) > xvar <- seq(1, 1, 10) > chr <- c(rep(1,200),rep(2,200), rep(3,200), rep(4,200), rep(5,200)) > set.seed(134) > p <- rnorm(1000, 0.15,0.05) > dataf <- data.frame(name,xvar, chr, p) > dataf$chr <- as.factor(dataf$chr) > > # lattice plot: As far as I can go now ! > require(lattice) > xyplot(pval ~ xvar1|chr, dataf) > > > Best Regards > > Umesh R > > > > > _ > > From: Jorge Ivan Velez [mailto:jorgeivanve...@gmail.com] > Sent: Sunday, March 06, 2011 12:22 AM > To: Umesh Rosyara > Cc: R mailing list > Subject: Re: [R] displaying label meeting condition (i.e. significant, i..e > p value less than 005) in plot function > > > Hi Umesh, > > > You can try something along the lines of: > > > d <- dataf[dataf$p < 0.05, ] # p < 0.05 > with(d, plot(xvar, p, col = 'white')) > with(d, text(xvar, p, name, cex = .7)) > > HTH, > Jorge > > > > On Sat, Mar 5, 2011 at 12:29 PM, Umesh Rosyara <> wrote: > > > Dear R users, > > Here is my problem: > > # example data > name <- c(paste ("M", 1:1000, sep = "")) > xvar <- seq(1, 1, 10) > set.seed(134) > p <- rnorm(1000, 0.15,0.05) > dataf <- data.frame(name,xvar, p) > plot (dataf$xvar,p) > abline(h=0.05) > > # I can know which observation number is less than 0.05 > which (dataf$p < 0.05) > [1] 12 20 80 269 272 338 366 368 397 403 432 453 494 543 592 691 723 789 > 811 > [20] 854 891 931 955 > > I want to display (label) corresponding names on the plot above: > means that 12th observation M12, 20th observation M20 and so on. Please note > that I have names not in numerical sequience (rather different names), just > provided
[R] Lattice experts: RE: Data lebals xylattice plot: RE: displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function
Hi Bert and Lattice experts Thank you for suggestion and I am still reading your suggestions (Deepayan's book) and help guide. So far no silverlining in horizon. Here is my outline, that keep changing: require(lattice) pvals <- which (dataf$p < 0.05) xyplot(p ~ xvar|chr, data=dataf, pvals = pvals, col="green",pch=3, fill.color="green", cex=1, panel = function(x,y, pvals) { panel.xyplot(x, y, pch=3, fill=fill) panel.text((x,y, pvals) } ) # for new lattice plot experts, this was my data: name <- c(paste ("M", 1:1000, sep = "")) xvar <- seq(1, 1, 10) chr <- c(rep(1,200),rep(2,200), rep(3,200), rep(4,200), rep(5,200)) set.seed(134) p <- rnorm(1000, 0.15,0.05) dataf <- data.frame(name,xvar, chr, p) dataf$chr <- as.factor(dataf$chr) May I need some rest. Thank you for your suggestions. Thanks; Best Regards Umesh R _ From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Sunday, March 06, 2011 10:50 AM To: Umesh Rosyara Cc: Jorge Ivan Velez; Dennis Murphy; sarah.gos...@gmail.com; R mailing list Subject: Re: [R] Data lebals xylattice plot: RE: displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function This is easy to do by specifying xyplot's panel function. Assuming only one panel -- otherwise you need to pass the subscripts arguments to choose the values belonging to the panel -- somethings like: xyplot(y~x, pvals = pvals,..., ## pvals is your vector of small p values with e.g. NA's elsewhere panel = function(x,y, pvals,...) { panel.xyplot(...) panel.text((x,y, pvals,...) } ) This is obviously just a sketch and will not work as written. So please read the Help page on xyplot carefully and perhaps also Deepayan's book on trellis graphics -- there are also undoubtedly online resources: search on "trellis graphics tutorial" or some such. This is not hard, but there are some details that you will need to master,especially regarding argument passing. Another alternative is to use the layer() function in the latticeExtra package instead. Consult the documentation there for details. Cheers, Bert On Sun, Mar 6, 2011 at 5:17 AM, Umesh Rosyara wrote: > Dear Jorge, Dennis, Sarah and R-experts. > > Thank for helping me. As you mentioned it is difficult apply in lattice in > this situation. > > Unless, there is a possibility, I would try to use lattice. The major reason > toward this is- my ultimate solution might be better of in lattice as I have > a classificatory variable to make similar graph for each caterogory in the > lattice graph. Lattice cleates nice stacked xyplots. > > p ~ xvar | chr # require plots by the factor variable "chr" > > # with a classificatory variable > name <- c(paste ("M", 1:1000, sep = "")) > xvar <- seq(1, 1, 10) > chr <- c(rep(1,200),rep(2,200), rep(3,200), rep(4,200), rep(5,200)) > set.seed(134) > p <- rnorm(1000, 0.15,0.05) > dataf <- data.frame(name,xvar, chr, p) > dataf$chr <- as.factor(dataf$chr) > > # lattice plot: As far as I can go now ! > require(lattice) > xyplot(pval ~ xvar1|chr, dataf) > > > Best Regards > > Umesh R > > > > > _ > > From: Jorge Ivan Velez [mailto:jorgeivanve...@gmail.com] > Sent: Sunday, March 06, 2011 12:22 AM > To: Umesh Rosyara > Cc: R mailing list > Subject: Re: [R] displaying label meeting condition (i.e. significant, i..e > p value less than 005) in plot function > > > Hi Umesh, > > > You can try something along the lines of: > > > d <- dataf[dataf$p < 0.05, ] # p < 0.05 > with(d, plot(xvar, p, col = 'white')) > with(d, text(xvar, p, name, cex = .7)) > > HTH, > Jorge > > > > On Sat, Mar 5, 2011 at 12:29 PM, Umesh Rosyara <> wrote: > > > Dear R users, > > Here is my problem: > > # example data > name <- c(paste ("M", 1:1000, sep = "")) > xvar <- seq(1, 1, 10) > set.seed(134) > p <- rnorm(1000, 0.15,0.05) > dataf <- data.frame(name,xvar, p) > plot (dataf$xvar,p) > abline(h=0.05) > > # I can know which observation number is less than 0.05 > which (dataf$p < 0.05) > [1] 12 20 80 269 272 338 366 368 397 403 432 453 494 543 592 691 723 789 > 811 > [20] 854 891 931 955 > > I want to display (label) corresponding names on the plot above: > means that 12th observation M12, 20th observation M20 and so on. Please note > that I have names not in numerical sequience (rather different names), just > provided for this example to create dataset easily. > > Thanks in advance > > Umesh R > > > [[alternative HTML version deleted]] > > _
[R] Data lebals xylattice plot: RE: displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function
Dear Jorge, Dennis, Sarah and R-experts. Thank for helping me. As you mentioned it is difficult apply in lattice in this situation. Unless, there is a possibility, I would try to use lattice. The major reason toward this is- my ultimate solution might be better of in lattice as I have a classificatory variable to make similar graph for each caterogory in the lattice graph. Lattice cleates nice stacked xyplots. p ~ xvar | chr # require plots by the factor variable "chr" # with a classificatory variable name <- c(paste ("M", 1:1000, sep = "")) xvar <- seq(1, 1, 10) chr <- c(rep(1,200),rep(2,200), rep(3,200), rep(4,200), rep(5,200)) set.seed(134) p <- rnorm(1000, 0.15,0.05) dataf <- data.frame(name,xvar, chr, p) dataf$chr <- as.factor(dataf$chr) # lattice plot: As far as I can go now ! require(lattice) xyplot(pval ~ xvar1|chr, dataf) Best Regards Umesh R _ From: Jorge Ivan Velez [mailto:jorgeivanve...@gmail.com] Sent: Sunday, March 06, 2011 12:22 AM To: Umesh Rosyara Cc: R mailing list Subject: Re: [R] displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function Hi Umesh, You can try something along the lines of: d <- dataf[dataf$p < 0.05, ] # p < 0.05 with(d, plot(xvar, p, col = 'white')) with(d, text(xvar, p, name, cex = .7)) HTH, Jorge On Sat, Mar 5, 2011 at 12:29 PM, Umesh Rosyara <> wrote: Dear R users, Here is my problem: # example data name <- c(paste ("M", 1:1000, sep = "")) xvar <- seq(1, 1, 10) set.seed(134) p <- rnorm(1000, 0.15,0.05) dataf <- data.frame(name,xvar, p) plot (dataf$xvar,p) abline(h=0.05) # I can know which observation number is less than 0.05 which (dataf$p < 0.05) [1] 12 20 80 269 272 338 366 368 397 403 432 453 494 543 592 691 723 789 811 [20] 854 891 931 955 I want to display (label) corresponding names on the plot above: means that 12th observation M12, 20th observation M20 and so on. Please note that I have names not in numerical sequience (rather different names), just provided for this example to create dataset easily. Thanks in advance Umesh R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ No virus found in this message. Checked by AVG - www.avg.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function
Dear R users, Here is my problem: # example data name <- c(paste ("M", 1:1000, sep = "")) xvar <- seq(1, 1, 10) set.seed(134) p <- rnorm(1000, 0.15,0.05) dataf <- data.frame(name,xvar, p) plot (dataf$xvar,p) abline(h=0.05) # I can know which observation number is less than 0.05 which (dataf$p < 0.05) [1] 12 20 80 269 272 338 366 368 397 403 432 453 494 543 592 691 723 789 811 [20] 854 891 931 955 I want to display (label) corresponding names on the plot above: means that 12th observation M12, 20th observation M20 and so on. Please note that I have names not in numerical sequience (rather different names), just provided for this example to create dataset easily. Thanks in advance Umesh R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] please help ! label selected data points in huge number of data points potentially as high as 50, 000 !
Dear All I am reposting because I my problem is real issue and I have been working on this. I know this might be simple to those who know it ! Anyway I need help ! Let me clear my point. I have huge number of datapoints plotted using either base plot function or xyplot in lattice (I have preference to use lattice). name xvarp 1 M11 0.107983837 2 M2 11 0.209125624 3 M3 21 0.163959428 4 M4 31 0.132469859 5 M5 41 0.086095130 6 M6 51 0.180822010 7 M7 61 0.246619925 8 M8 71 0.147363687 9 M9 81 0.162663127 5000 observations I need to plot xvar (x variable) and p (y variable) using either plot () or xyplot(). And I want show (print to graph) datapoint name labels to those rows that have p value < 0.01 (means that they are significant). With my limited R knowlege I can use text (x,y, labels) option to manually add the text, but I have huge number of data point(though I provide just 1000 here, potentially it can go upto 50,000). So I want to display name corresponding to those observations (rows) that have pvalue less than 0.05 (threshold). Here is my example dataset and my status: name <- c(paste ("M", 1:5000, sep = "")) xvar <- seq(1, 5, 10) set.seed(134) p <- rnorm(5000, 0.15,0.05) dataf <- data.frame(name,xvar, p) # using lattice (my first preference) require(lattice) xyplot(p ~ xvar, dataf) #I want to display names for the following observation that meet requirement of p <0.01. which (dataf$p < 0.01) [1] 811 854 1636 1704 2148 2161 2244 3205 3268 4177 4564 4614 4639 4706 Thus significant observations are: name xvar p 811 M811 8101 0.0050637068 854 M854 8531 -0.0433901783 1636 M1636 16351 -0.0279014039 1704 M1704 17031 0.0029878335 2148 M2148 21471 0.0048898232 2161 M2161 21601 -0.0354130557 2244 M2244 22431 0.0003255200 3205 M3205 32041 0.0079758430 3268 M3268 32671 0.0012797145 4177 M4177 41761 0.0015487439 4564 M4564 45631 0.0024867152 4614 M4614 46131 0.0078381964 4639 M4639 46381 -0.0063151605 4706 M4706 47051 0.0032200517 I want the datapoint (8101, 0.0050637068) with M811 in the plot. Similarly for all of the above (that are significant). I do not want to label all out of 5000 who do have p value < 0.01. I know I can add manually - text (8101, 0.0050637068, M811) in plot() in base. plot (dataf$xvar,p) text (8101, 0.0050637068, "M811") text (8531, -0.0433901783, "M854") I need more automation to deal with observations as high as 50,000. In real sense I do not know how many variables there will be. You help is highly appreciated. Thank you; Best Regards Umesh R _ From: Umesh Rosyara [mailto:rosyar...@gmail.com] Sent: Saturday, March 05, 2011 12:30 PM To: 'r-help@r-project.org' Subject: displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function Dear R users, Here is my problem: # example data name <- c(paste ("M", 1:1000, sep = "")) xvar <- seq(1, 1, 10) set.seed(134) p <- rnorm(1000, 0.15,0.05) dataf <- data.frame(name,xvar, p) plot (dataf$xvar,p) abline(h=0.05) # I can know which observation number is less than 0.05 which (dataf$p < 0.05) [1] 12 20 80 269 272 338 366 368 397 403 432 453 494 543 592 691 723 789 811 [20] 854 891 931 955 I want to display (label) corresponding names on the plot above: means that 12th observation M12, 20th observation M20 and so on. Please note that I have names not in numerical sequience (rather different names), just provided for this example to create dataset easily. Thanks in advance Umesh R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] thank you
Hi Dennis I was able to my problem. Thank you encouragement and time. n<-7 newvars <- c(paste('m', rep(1:n, each = 4), rep(c('a', 'b')), rep(c('p1', 'p2'), each = 2), sep = '')) newvars [1] "m1ap1" "m1bp1" "m1ap2" "m1bp2" "m2ap1" "m2bp1" "m2ap2" "m2bp2" "m3ap1" [10] "m3bp1" "m3ap2" "m3bp2" "m4ap1" "m4bp1" "m4ap2" "m4bp2" "m5ap1" "m5bp1" [19] "m5ap2" "m5bp2" "m6ap1" "m6bp1" "m6ap2" "m6bp2" "m7ap1" "m7bp1" "m7ap2" [28] "m7bp2" Umesh R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stuk at another point: simple question
Dear All I now realized that it is not simple to deal with realworld problems! This what I tried without any success: a <- seq(1, nvar, by = 2) b <- seq(2, nvar, by = 2) #df2 <- transform(df2, ima1p1 = df2$x1[df2$Parent1], # Parent 1's allele 1 #ima2p1 = df2$x2[df2$Parent1], # Parent 1's allele 2 #ima1p2 = df2$x1[df,# Parent 2's allele 1 #ima2p2 = df2$x2[df2$Parent2]) # Parent2's allele 2 out <- lapply(1:nmark, function(ind){ n <- nvar/2 transform(df2, ima1p1 = df2[, a[ind]][df$Parent1], # Parent 1's allele 1 ima2p1 = df2[, b[ind]][df2$Parent1], # Parent 1's allele 2 ima1p2 = df2[, a[ind]][df2$Parent2], # Parent 2's allele 1 ima2p2 = df2[, a[ind]][df2$Parent2])} # Parent2's allele 2 I could go further down because I had already an error ! I am particularly confused how can apply the index in df2$Parent1 or df2$ parent2. Please help. Thank you; Umesh R _ From: Umesh Rosyara [mailto:rosyar...@gmail.com] Sent: Monday, February 28, 2011 8:01 AM To: 'Dennis Murphy' Cc: 'r-help@r-project.org' Subject: stuk at another point: simple question Dear R-community members. I am really appreciate R-help group. Dennis has been extrremely helpful to solve some of my questions. I am following Dennis recommendation in the following email, yet I am stuck at another point (hope this will took me to end of this project. Ind <- c(1:5) Parent1 <- c(NA,NA,1,1,3) Parent2 <- c(NA,NA,2,2,4) y <- c(6,5,8,10,7) M1a <- c(1,2,1,1,1) M1b <- c(1,2,2,2,1) M2a <- c(3,3,1,1,3) M2b <- c(1,1,3,3,3) M3a <- c(4,4,4,4,4) M3b <- c(4,4,1,1,4) M4a <- c(1,4,4,1,4) M4b <- c(4,4,4,4,4) dataf <- data.frame (Ind, Parent1, Parent2, y, M1a, M1b,M2a,M2b, M3a,M3b,M4a, M4b) # I have more than >1000 variables pair # pair1 (M1a,M1b) pair2 (M2a, M2b), pair3 (M3a, M3b)... df2 <- transform(dataf,m1ap1 = dataf$M1a[dataf$Parent1], m1bp1 = dataf$M1b[dataf$Parent1], m1ap2 = dataf$M1a[dataf$Parent2], m1bp2 = dataf$M1b[dataf$Parent2]) # downstream calculations hP1 <- ifelse(df2$m1ap1==df2$m1bp1,0,1) hP2 <- ifelse(df2$m1bp2==df2$m1bp2,0,1) t1 <- ifelse(df2$M1a==df2$m1ap1,1,0) t2 <- ifelse(df2$M1b==df2$m1ap2,1,0) C <- (hP1*(t1-0.25)+ hP2 *(t2-0.25)) yv <- df2$y Cy <- C*yv avgCy <- mean(Cy, na.rm=T) avgCy # I want to store this value to new dataframe with first model i.e. How can I loop the process to output the second pair( here M2a, M2b), third pair (here M3a, M3b) to all pairs (I have more than 1000) Mode1 avgCy 1 1.75 # from pair M1a and M1b 2 # from pair M2a and M2b 3 # from pair M3a and M3b 4 # from pair M4a and M4b to the end of the file Thank you in advance Umesh R _ From: Dennis Murphy [mailto:djmu...@gmail.com] Sent: Friday, February 18, 2011 12:28 AM To: Umesh Rosyara Cc: r-help@r-project.org Subject: Re: [R] recoding a data in different way: please help Hi: This is as far as I could get: df <- read.table(textConnection(" Individual Parent1 Parent2 mark1 mark2 10 0 12 11 20 0 11 22 30 0 13 22 40 0 13 11 51 2 11 12 61 2 12 12 73 4 11 12 83 4 13 12 91 4 11 12 10 1 4 11 12"), header = TRUE) df2 <- transform(df, Parent1 = replace(Parent1, Parent1 == 0, NA), Parent2 = replace(Parent2, Parent2 == 0, NA)) df2 <- transform(df2, imark1p1 = df2$mark1[df2$Parent1], # Parent 1's mark1 imark1p2 = df2$mark1[df2$Parent2], # Parent 2's mark1 imark2p1 = df2$mark2[df2$Parent1], # Parent 1's mark2 imark2p2 = df2$mark2[df2$Parent2]) # Parent 2's mark2 I created df2 so as not to overwrite the original in case of a mistake. At this point, you have several sets of vectors that you can compare; e.g., mark1 with imark1p1 and imark1p2. Like Josh, I couldn't make heads or tails out of what these logical tests were meant to output, but perhaps this gives you a broader template with which to work. At this point, you can probably remove the rows corresponding to the parents. I believe ifelse() is your friend here - it can perform logical tests in a vectorized fashion. As long as the tests are consistent from one individual to the next, it's likely to be an efficient route. HTH, Dennis On Thu, Feb 17, 2011 at 6:21 PM, Umesh
[R] stuk at another point: simple question
Dear R-community members. I am really appreciate R-help group. Dennis has been extrremely helpful to solve some of my questions. I am following Dennis recommendation in the following email, yet I am stuck at another point (hope this will took me to end of this project. Ind <- c(1:5) Parent1 <- c(NA,NA,1,1,3) Parent2 <- c(NA,NA,2,2,4) y <- c(6,5,8,10,7) M1a <- c(1,2,1,1,1) M1b <- c(1,2,2,2,1) M2a <- c(3,3,1,1,3) M2b <- c(1,1,3,3,3) M3a <- c(4,4,4,4,4) M3b <- c(4,4,1,1,4) M4a <- c(1,4,4,1,4) M4b <- c(4,4,4,4,4) dataf <- data.frame (Ind, Parent1, Parent2, y, M1a, M1b,M2a,M2b, M3a,M3b,M4a, M4b) # I have more than >1000 variables pair # pair1 (M1a,M1b) pair2 (M2a, M2b), pair3 (M3a, M3b)... df2 <- transform(dataf,m1ap1 = dataf$M1a[dataf$Parent1], m1bp1 = dataf$M1b[dataf$Parent1], m1ap2 = dataf$M1a[dataf$Parent2], m1bp2 = dataf$M1b[dataf$Parent2]) # downstream calculations hP1 <- ifelse(df2$m1ap1==df2$m1bp1,0,1) hP2 <- ifelse(df2$m1bp2==df2$m1bp2,0,1) t1 <- ifelse(df2$M1a==df2$m1ap1,1,0) t2 <- ifelse(df2$M1b==df2$m1ap2,1,0) C <- (hP1*(t1-0.25)+ hP2 *(t2-0.25)) yv <- df2$y Cy <- C*yv avgCy <- mean(Cy, na.rm=T) avgCy # I want to store this value to new dataframe with first model i.e. How can I loop the process to output the second pair( here M2a, M2b), third pair (here M3a, M3b) to all pairs (I have more than 1000) Mode1 avgCy 1 1.75 # from pair M1a and M1b 2 # from pair M2a and M2b 3 # from pair M3a and M3b 4 # from pair M4a and M4b to the end of the file Thank you in advance Umesh R _ From: Dennis Murphy [mailto:djmu...@gmail.com] Sent: Friday, February 18, 2011 12:28 AM To: Umesh Rosyara Cc: r-help@r-project.org Subject: Re: [R] recoding a data in different way: please help Hi: This is as far as I could get: df <- read.table(textConnection(" Individual Parent1 Parent2 mark1 mark2 10 0 12 11 20 0 11 22 30 0 13 22 40 0 13 11 51 2 11 12 61 2 12 12 73 4 11 12 83 4 13 12 91 4 11 12 10 1 4 11 12"), header = TRUE) df2 <- transform(df, Parent1 = replace(Parent1, Parent1 == 0, NA), Parent2 = replace(Parent2, Parent2 == 0, NA)) df2 <- transform(df2, imark1p1 = df2$mark1[df2$Parent1], # Parent 1's mark1 imark1p2 = df2$mark1[df2$Parent2], # Parent 2's mark1 imark2p1 = df2$mark2[df2$Parent1], # Parent 1's mark2 imark2p2 = df2$mark2[df2$Parent2]) # Parent 2's mark2 I created df2 so as not to overwrite the original in case of a mistake. At this point, you have several sets of vectors that you can compare; e.g., mark1 with imark1p1 and imark1p2. Like Josh, I couldn't make heads or tails out of what these logical tests were meant to output, but perhaps this gives you a broader template with which to work. At this point, you can probably remove the rows corresponding to the parents. I believe ifelse() is your friend here - it can perform logical tests in a vectorized fashion. As long as the tests are consistent from one individual to the next, it's likely to be an efficient route. HTH, Dennis On Thu, Feb 17, 2011 at 6:21 PM, Umesh Rosyara wrote: Dear R users The following question looks simple but I have spend alot of time to solve it. I would highly appeciate your help. I have following dataset from family dataset : Here we have individuals and their two parents and their marker scores (marker1, marker2,and so on). 0 means that their parent information not available. Individual Parent1 Parent2 mark1 mark2 10 0 12 11 20 0 11 22 30 0 13 22 40 0 13 11 51 2 11 12 61 2 12 12 73 4 11 12 83 4 13 12 91 4 11 12 10 1 4 11 12 I want to recode mark1 and other mark2.and so on column by looking indvidual parent (Parent1 and Parent2). For example Take case of Individual 5, who's Parent 1 is 1 (has mark1 score 12) and Parent 2 is 2 (has mark1 score 11). Individual 5 has mark1 score 11. Suppose I have following condition to recode Individual 5's mark1 score: For mark1 variable, If Parent1 score "11" and Parent2 score "22" and recode indvidual 5's score, "12"=1, else 0 If Parent1 score "12" and Parent2 score "
Re: [R] help please ..simple question regarding output the p-value inside a function and lm
Hi Jorge and R users Thank you so much for the responses. You input helped me alot and potentially can help me to solve one more problem, but I got error message. I am sorry to ask you again but if you can find my problem in quick look that will be great. I hope this will not cost alot of your time as this is based on your idea. # Just data X1 <- c(1,3,4,2,2) X2 <- c(2,1,3,1,2) X3 <- c(4,3,2,1,1) X4<- c(1,1,1,2,3) X5 <- c(3,2,1,1,2) X6 <- c(1,1,2,2,3) odataframe <- data.frame(X1,X2,X3,X4,X5,X6) My objective here is sort the value of the pair of variables (X1 and X2, X3 and X4, X5 and X6 and so on.) in such way that the second column in pair is always higher than the first one (X2 > X1, X4 > X3, X6> X5 and so on...). Here is my attempt: nmrk <- 3 nvar <- 2*nmrk lapply(1:nvar, function(ind){ # indices for the variables we need a <- seq(1, nvar, by = 2) b <- seq(2, nvar, by = 2) # shorting column tx[, a[ind]] = ifelse(odataframe[, a[ind]] < odataframe[,b[ind]], odataframe[, a[ind]], odataframe[, b[ind]]) tx[, b[ind]] = ifelse(odataframe[, b[ind]] > dataframe[,a[ind]], odataframe[,b[ind]], odataframe[,a[ind]]) df1 <- transform( odataframe, odataframe[, a[ind]]= tx[, a[ind]], odataframe[, b[ind]]= tx[, b[ind]])) } I got the following error: Error: Error: unexpected '=' in: "tx[, b[ind]] = ifelse(odataframe[, b[ind]] > dataframe[,a[ind]], odataframe[,b[ind]], odataframe[,a[ind]]) df1 <- transform( odataframe, odataframe[, a[ind]]=" Thanks; Umesh R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help please ..simple question regarding output the p-value inside a function and lm
Dear R community members and R experts I am stuck at a point and I tried with my colleagues and did not get it out. Sorry, I need your help. Here my data (just created to show the example): # generating a dataset just to show how my dataset look like, here I have x variables # x1 .to X1000 plus ind and y ind <- c(1:100) y <- rnorm(100, 10,2) set.seed(201) P <- vector() dataf1 <- as.data.frame(matrix(rep(NA, 10), nrow=100)) dataf <- data.frame (dataf1, ind,y) names(dataf) <- (c(paste("x",1:1000, sep=""),"ind", "y")) for(i in 1:1000) { dataf[,i] <- rnorm(100) } # my intension was to fit a model that would fit the following fashion: y ~ x1 +x2, y ~ x3+x4, y ~ x5+ x6y ~ x999+x1000 (to end of the dataframe) # please not that I want to avoid to fit y ~ x2 + x3 or y ~ x4 + x5 (means that I am selecting two x variables at time to end) # question: how can I do this and put inside a user function as I worked out the following??? # defining function for lm model mylm <- function (mydata,nvar) { y <- NULL P1 <- vector (mode="numeric", length = nvar) P2 <- vector (mode="numeric", length = nvar) for(i in 1: nvar) { print(P1[i] <- summary(lm(mydata$y ~ mydata[,i]) + mydata[,i+1]$coefficients[2,4])) print(P2[i] <- summary(lm(mydata$y ~ mydata[,i]) + mydata[,i+1]$coefficients[2,5])) print(plot(nvar, P1)) print(plot(nvar, P2)) } } # applying the function to mydata mylm (dataf, 1000) Does not work?? The following is the error message: Error in model.frame.default(formula = mydata$y ~ mydata[, i], drop.unused.levels = TRUE) : invalid type (NULL) for variable 'mydata$y' Please help ! Thanks; Umesh R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple recoding problem, but a trouble !
Thank you David I was able to create dataframe and restore names with the following: dfr1 <- data.frame(t( apply(dfr, 1, func) )) names(dfr1) <- c("marker1a","marker1b", "marker2a", "marker2b" ,"marker3a", "marker3b") Still I wonder if there is easier way to restore the names, in situations where there are 1000's of variables making the list as above might be tidious. Thank you for solving my problem. I appreciate it. Umesh R _ From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Saturday, February 19, 2011 10:28 AM To: Umesh Rosyara Cc: 'Joshua Wiley'; r-help@r-project.org Subject: Re: [R] simple recoding problem, but a trouble ! On Feb 19, 2011, at 8:40 AM, Umesh Rosyara wrote: > Just a correction. My expected outdata frame was somehow distorted > to a > single, one column. So correct one is: > > marker1a markerb marker2amarker2b > 1 1 1 1 > 1 3 1 3 > 3 3 3 3 > 3 3 3 3 > 1 3 1 3 > 1 3 1 3 func <- function(x) {sapply( strsplit(x, ""), match, table= c("A", NA, "C"))} t( apply(dfr, 1, func) ) [,1] [,2] [,3] [,4] [1,]1111 [2,]1313 [3,]3333 [4,]3333 [5,]1313 [6,]1313 It's amatrix rather than a dataframe and doesn't have colnames but that should be trivial to fix. > > Thanks; > > Umesh R > > _ > > From: Umesh Rosyara [mailto:rosyar...@gmail.com] > Sent: Friday, February 18, 2011 10:09 PM > To: 'Joshua Wiley' > Cc: 'r-help@r-project.org' > Subject: RE: [R] recoding a data in different way: please help > > > Hi Josh and R community members > > Thank you for quick response. I am impressed with the help. > > To solve my problems, I tried recode options and I had the following > problem > and which motivated me to leave it. Thank you for remind me the option > again, might help to solve my problem in different way. > > marker1 <- c("AA", "AC", "CC", "CC", "AC", "AC") > > marker2 <- c("AA", "AC", "CC", "CC", "AC", "AC") > > dfr <- data.frame(cbind(marker1, marker2)) > > Objective: replace A with 1, C with 3, and split AA into 1 1 (two > columns > numeric). So the intended output for the above dataframe is: > > > > marker1a > markerb > marker2a > marker2b > > 1 > 1 > 1 > 1 > > 1 > 3 > 1 > 3 > > 3 > 3 > 3 > 3 > > 3 > 3 > 3 > 3 > > 1 > 3 > 1 > 3 > > 1 > 3 > 1 > 3 > > I tried the following: > > for(i in 1:length(dfr)) > { > dfr[[i]]=recode (dfr[[i]],"c('AA')= '1,1'; c('AC')= '1,3'; > c('CA')= > '1,3'; c('CC')= '3,3' ") > } > > write.table(dfr,"dfr.out", sep=" ,", col.names = T) > dfn=read.table("dfr.out",header=T, sep="," ) > > # just trying to cheat R, unfortunately the marker1 and marker columns > remained non-numeric, even when opened in excel !! > > > Unfortunately I got the following result ! > > marker1 marker2 > 1 1,1 1,1 > 2 1,2 1,2 > 3 2,2 2,2 > 4 2,2 2,2 > 5 1,2 1,2 > 6 1,2 1,2 > > > Sorry to bother all of you, but simple things are being complicated > these > days to me. > > Thank you so much > Umesh R > > > _ > > From: Joshua Wiley [mailto:jwiley.ps...@gmail.com] > Sent: Friday, February 18, 2011 12:15 AM > Cc: r-help@r-project.org > Subject: Re: [R] recoding a data in different way: please help > > > > Dear Umesh, > > I could not figure out exactly what your recoding scheme was, so I do > not have a specific solution for you. That said, the following > functions may help you get started. > > ?ifelse # vectorized and different from using if () statements > ?if # > ?Logic ## logical operators for your tests > ## if you install and load the "car" package by John Fox > ?recode # a function for recoding in package "car" > > I am sure it is possible to string together some massive series of if > statements and then use a for loop, but that is probably the messiest > and slowest possible way. I suspect there will be faster, neater > options, but I cannot say for certain wit
[R] simple recoding problem, but a trouble !
Just a correction. My expected outdata frame was somehow distorted to a single, one column. So correct one is: marker1a markerb marker2amarker2b 11 1 1 13 1 3 33 3 3 33 3 3 13 1 3 13 1 3 Thanks; Umesh R _ From: Umesh Rosyara [mailto:rosyar...@gmail.com] Sent: Friday, February 18, 2011 10:09 PM To: 'Joshua Wiley' Cc: 'r-help@r-project.org' Subject: RE: [R] recoding a data in different way: please help Hi Josh and R community members Thank you for quick response. I am impressed with the help. To solve my problems, I tried recode options and I had the following problem and which motivated me to leave it. Thank you for remind me the option again, might help to solve my problem in different way. marker1 <- c("AA", "AC", "CC", "CC", "AC", "AC") marker2 <- c("AA", "AC", "CC", "CC", "AC", "AC") dfr <- data.frame(cbind(marker1, marker2)) Objective: replace A with 1, C with 3, and split AA into 1 1 (two columns numeric). So the intended output for the above dataframe is: marker1a markerb marker2a marker2b 1 1 1 1 1 3 1 3 3 3 3 3 3 3 3 3 1 3 1 3 1 3 1 3 I tried the following: for(i in 1:length(dfr)) { dfr[[i]]=recode (dfr[[i]],"c('AA')= '1,1'; c('AC')= '1,3'; c('CA')= '1,3'; c('CC')= '3,3' ") } write.table(dfr,"dfr.out", sep=" ,", col.names = T) dfn=read.table("dfr.out",header=T, sep="," ) # just trying to cheat R, unfortunately the marker1 and marker columns remained non-numeric, even when opened in excel !! Unfortunately I got the following result ! marker1 marker2 1 1,1 1,1 2 1,2 1,2 3 2,2 2,2 4 2,2 2,2 5 1,2 1,2 6 1,2 1,2 Sorry to bother all of you, but simple things are being complicated these days to me. Thank you so much Umesh R _ From: Joshua Wiley [mailto:jwiley.ps...@gmail.com] Sent: Friday, February 18, 2011 12:15 AM Cc: r-help@r-project.org Subject: Re: [R] recoding a data in different way: please help Dear Umesh, I could not figure out exactly what your recoding scheme was, so I do not have a specific solution for you. That said, the following functions may help you get started. ?ifelse # vectorized and different from using if () statements ?if # ?Logic ## logical operators for your tests ## if you install and load the "car" package by John Fox ?recode # a function for recoding in package "car" I am sure it is possible to string together some massive series of if statements and then use a for loop, but that is probably the messiest and slowest possible way. I suspect there will be faster, neater options, but I cannot say for certain without having a better feel for how all the conditions work. Best regards, Josh On Thu, Feb 17, 2011 at 6:21 PM, Umesh Rosyara wrote: > Dear R users > > The following question looks simple but I have spend alot of time to solve > it. I would highly appeciate your help. > > I have following dataset from family dataset : > > Here we have individuals and their two parents and their marker scores > (marker1, marker2,and so on). 0 means that their parent information not > available. > > > Individual Parent1 Parent2 mark1 mark2 > 10 0 12 11 > 20 0 11 22 > 30 0 13 22 > 40 0 13 11 > 51 2 11 12 > 61 2 12 12 > 73 4 11 12 > 83 4 13 12 > 91 4 11 12 > 10 1 4 11 12 > > I want to recode mark1 and other mark2.and so on column by looking > indvidual parent (Parent1 and Parent2). > > For example > > Take case of Individual 5, who's Parent 1 is 1 (has mark1 score 12) and > Parent 2 is 2 (has mark1 score 11). Individual 5 has mark1 score 11. Suppose > I have following condition to recode Individual 5's mark1 score: > > For mark1 variable, If Parent1 score "11" and Parent2 score "22" and recode > indvidual 5's score, "12"=1, else 0 >If Parent1 score "12" and Parent2 score > "22" and recode individual 5's score, "22"=1, "12"= 0.5, else 0 >.more conditions > > Similarly the pointer should move from i
Re: [R] recoding a data in different way: please help
Hi Josh and R community members Thank you for quick response. I am impressed with the help. To solve my problems, I tried recode options and I had the following problem and which motivated me to leave it. Thank you for remind me the option again, might help to solve my problem in different way. marker1 <- c("AA", "AC", "CC", "CC", "AC", "AC") marker2 <- c("AA", "AC", "CC", "CC", "AC", "AC") dfr <- data.frame(cbind(marker1, marker2)) Objective: replace A with 1, C with 3, and split AA into 1 1 (two columns numeric). So the intended output for the above dataframe is: marker1a markerb marker2a marker2b 1 1 1 1 1 3 1 3 3 3 3 3 3 3 3 3 1 3 1 3 1 3 1 3 I tried the following: for(i in 1:length(dfr)) { dfr[[i]]=recode (dfr[[i]],"c('AA')= '1,1'; c('AC')= '1,3'; c('CA')= '1,3'; c('CC')= '3,3' ") } write.table(dfr,"dfr.out", sep=" ,", col.names = T) dfn=read.table("dfr.out",header=T, sep="," ) # just trying to cheat R, unfortunately the marker1 and marker columns remained non-numeric, even when opened in excel !! Unfortunately I got the following result ! marker1 marker2 1 1,1 1,1 2 1,2 1,2 3 2,2 2,2 4 2,2 2,2 5 1,2 1,2 6 1,2 1,2 Sorry to bother all of you, but simple things are being complicated these days to me. Thank you so much Umesh R _ From: Joshua Wiley [mailto:jwiley.ps...@gmail.com] Sent: Friday, February 18, 2011 12:15 AM Cc: r-help@r-project.org Subject: Re: [R] recoding a data in different way: please help Dear Umesh, I could not figure out exactly what your recoding scheme was, so I do not have a specific solution for you. That said, the following functions may help you get started. ?ifelse # vectorized and different from using if () statements ?if # ?Logic ## logical operators for your tests ## if you install and load the "car" package by John Fox ?recode # a function for recoding in package "car" I am sure it is possible to string together some massive series of if statements and then use a for loop, but that is probably the messiest and slowest possible way. I suspect there will be faster, neater options, but I cannot say for certain without having a better feel for how all the conditions work. Best regards, Josh On Thu, Feb 17, 2011 at 6:21 PM, Umesh Rosyara wrote: > Dear R users > > The following question looks simple but I have spend alot of time to solve > it. I would highly appeciate your help. > > I have following dataset from family dataset : > > Here we have individuals and their two parents and their marker scores > (marker1, marker2,and so on). 0 means that their parent information not > available. > > > Individual Parent1 Parent2 mark1 mark2 > 10 0 12 11 > 20 0 11 22 > 30 0 13 22 > 40 0 13 11 > 51 2 11 12 > 61 2 12 12 > 73 4 11 12 > 83 4 13 12 > 91 4 11 12 > 10 1 4 11 12 > > I want to recode mark1 and other mark2.and so on column by looking > indvidual parent (Parent1 and Parent2). > > For example > > Take case of Individual 5, who's Parent 1 is 1 (has mark1 score 12) and > Parent 2 is 2 (has mark1 score 11). Individual 5 has mark1 score 11. Suppose > I have following condition to recode Individual 5's mark1 score: > > For mark1 variable, If Parent1 score "11" and Parent2 score "22" and recode > indvidual 5's score, "12"=1, else 0 >If Parent1 score "12" and Parent2 score > "22" and recode individual 5's score, "22"=1, "12"= 0.5, else 0 >.more conditions > > Similarly the pointer should move from individual 5 to n individuals at the > end of the file. > > Thank you in advance > > Umesh R > > > > > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ _ No virus found in this message. Checked by AVG - www.avg.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recoding a data in different way: please help
Hi Dennis Thank you so much it helped me to go a step ahead. Regarding comparisions, here is what I want to do. If value of imarkP1 = 22, imarkP2 = 11, and mark1= 12 then the value of mark1 should be coded as 1 (means that all three conditions must be satified to get a code "1") imarkP1 = 22, imarkP2= 11, and mark1= 22 then the value mark should be coded as 2 (imarkP1 = 22, imparkP2=11, and mark1= 11 then the value of mark should be coded as 0) will go to else imarkP1= 33, imarkP2= 14, and mark=13 the value of mark1 should be coded as 0 imarkP1=33, imarkP2=14, and mark=34, the value of mark1 should be coded as 1 . I do have more such condtions I tried the following for the first conditon listed above, but could not get the result I want. I do not know what is wrong. Ifelse (imarkP1==22|imarkP2==11|mark1==12,1,0) I could not go forward.. Thank you so much for the help. Best regards; Umesh R _ From: Dennis Murphy [mailto:djmu...@gmail.com] Sent: Friday, February 18, 2011 12:28 AM Cc: r-help@r-project.org Subject: Re: [R] recoding a data in different way: please help Hi: This is as far as I could get: df <- read.table(textConnection(" Individual Parent1 Parent2 mark1 mark2 10 0 12 11 20 0 11 22 30 0 13 22 40 0 13 11 51 2 11 12 61 2 12 12 73 4 11 12 83 4 13 12 91 4 11 12 10 1 4 11 12"), header = TRUE) df2 <- transform(df, Parent1 = replace(Parent1, Parent1 == 0, NA), Parent2 = replace(Parent2, Parent2 == 0, NA)) df2 <- transform(df2, imark1p1 = df2$mark1[df2$Parent1], # Parent 1's mark1 imark1p2 = df2$mark1[df2$Parent2], # Parent 2's mark1 imark2p1 = df2$mark2[df2$Parent1], # Parent 1's mark2 imark2p2 = df2$mark2[df2$Parent2]) # Parent 2's mark2 I created df2 so as not to overwrite the original in case of a mistake. At this point, you have several sets of vectors that you can compare; e.g., mark1 with imark1p1 and imark1p2. Like Josh, I couldn't make heads or tails out of what these logical tests were meant to output, but perhaps this gives you a broader template with which to work. At this point, you can probably remove the rows corresponding to the parents. I believe ifelse() is your friend here - it can perform logical tests in a vectorized fashion. As long as the tests are consistent from one individual to the next, it's likely to be an efficient route. HTH, Dennis On Thu, Feb 17, 2011 at 6:21 PM Dear R users The following question looks simple but I have spend alot of time to solve it. I would highly appeciate your help. I have following dataset from family dataset : Here we have individuals and their two parents and their marker scores (marker1, marker2,and so on). 0 means that their parent information not available. Individual Parent1 Parent2 mark1 mark2 10 0 12 11 20 0 11 22 30 0 13 22 40 0 13 11 51 2 11 12 61 2 12 12 73 4 11 12 83 4 13 12 91 4 11 12 10 1 4 11 12 I want to recode mark1 and other mark2.and so on column by looking indvidual parent (Parent1 and Parent2). For example Take case of Individual 5, who's Parent 1 is 1 (has mark1 score 12) and Parent 2 is 2 (has mark1 score 11). Individual 5 has mark1 score 11. Suppose I have following condition to recode Individual 5's mark1 score: For mark1 variable, If Parent1 score "11" and Parent2 score "22" and recode indvidual 5's score, "12"=1, else 0 If Parent1 score "12" and Parent2 score "22" and recode individual 5's score, "22"=1, "12"= 0.5, else 0 .more conditions Similarly the pointer should move from individual 5 to n individuals at the end of the file. Thank you in advance Umesh R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ No virus found in this message. Checked by AVG - www.avg.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list htt
[R] recoding a data in different way: please help
Dear R users The following question looks simple but I have spend alot of time to solve it. I would highly appeciate your help. I have following dataset from family dataset : Here we have individuals and their two parents and their marker scores (marker1, marker2,and so on). 0 means that their parent information not available. Individual Parent1 Parent2 mark1 mark2 10 0 12 11 20 0 11 22 30 0 13 22 40 0 13 11 51 2 11 12 61 2 12 12 73 4 11 12 83 4 13 12 91 4 11 12 10 1 4 11 12 I want to recode mark1 and other mark2.and so on column by looking indvidual parent (Parent1 and Parent2). For example Take case of Individual 5, who's Parent 1 is 1 (has mark1 score 12) and Parent 2 is 2 (has mark1 score 11). Individual 5 has mark1 score 11. Suppose I have following condition to recode Individual 5's mark1 score: For mark1 variable, If Parent1 score "11" and Parent2 score "22" and recode indvidual 5's score, "12"=1, else 0 If Parent1 score "12" and Parent2 score "22" and recode individual 5's score, "22"=1, "12"= 0.5, else 0 .more conditions Similarly the pointer should move from individual 5 to n individuals at the end of the file. Thank you in advance Umesh R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.