Re: [R] help with recursive function
When I run the code without stopifnot, the code takes 5 min to run and then it throws an error listed below without producing any results. Error: node stack overflow In addition: There were 50 or more warnings (use warnings() to see the first 50) Error during wrapup: node stack overflow Thanks. Nilesh From: William Dunlap [mailto:wdun...@tibco.com] Sent: Thursday, December 14, 2017 11:26 AM To: DIGHE, NILESH [AG/2362] Cc: Eric Berger ; r-help Subject: Re: [R] help with recursive function Your code contains the lines stopifnot(!(any(data1$norm_sd >= 1))) if(!(any(data1$norm_sd >= 1))) { df1 <- dat1 return(df1) } stop() "throws an error", causing the current function and all functions in the call stack to abort and return nothing. It does not mean to stop now and return a result. Does the function give the correct results if you just leave out the stopifnot line? Bill Dunlap TIBCO Software wdunlap tibco.com<http://tibco.com> On Thu, Dec 14, 2017 at 9:11 AM, DIGHE, NILESH [AG/2362] mailto:nilesh.di...@monsanto.com>> wrote: Eric: I will try and see if I can figure out the issue by debugging as you suggested. I don’t know why my code after stopifnot is not getting executed where I like the code to run the funlp2 function when the if statement is TRUE but when it is false, I like it to keep running until the stopifnot condition is met. When the stopifnot condition is met, I like to get the output from if statement saved. Anyway, I will keep trying. Again, Thanks for your help! Nilesh From: Eric Berger [mailto:ericjber...@gmail.com<mailto:ericjber...@gmail.com>] Sent: Thursday, December 14, 2017 10:29 AM To: DIGHE, NILESH [AG/2362] mailto:nilesh.di...@monsanto.com>> Cc: r-help mailto:r-help@r-project.org>> Subject: Re: [R] help with recursive function If you are trying to understand why the "stopifnot" condition is met you can replace it by something like: if ( any(dat2$norm_sd >= 1) ) browser() This will put you in a debugging session where you can examine your variables, e.g. > dat$norm_sd HTH, Eric On Thu, Dec 14, 2017 at 5:33 PM, Eric Berger mailto:ericjber...@gmail.com><mailto:ericjber...@gmail.com<mailto:ericjber...@gmail.com>>> wrote: The message is coming from your stopifnot() condition being met. On Thu, Dec 14, 2017 at 5:31 PM, DIGHE, NILESH [AG/2362] mailto:nilesh.di...@monsanto.com><mailto:nilesh.di...@monsanto.com<mailto:nilesh.di...@monsanto.com>>> wrote: Hi, I accidently left out few lines of code from the calclp function. Updated function is pasted below. I am still getting the same error “Error: !(any(data1$norm_sd >= 1)) is not TRUE“ I would appreciate any help. Nilesh dput(calclp) function (dataset) { dat1 <- funlp1(dataset) recursive_funlp <- function(dataset = dat1, func = funlp2) { dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>% mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>% spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_", names(.)), funs(norm = round(scale(.), 3))) dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, sd, na.rm = TRUE), 3) dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, function(x) { max(abs(x), na.rm = TRUE) }), 3) data1 <- dat2 %>% gather(key, value, -uniqueid, -norm_max, -norm_sd) %>% separate(key, c("field_rep", "treatment"), "\\.") %>% spread(treatment, value) %>% mutate(outlier = NA) stopifnot(!(any(data1$norm_sd >= 1))) if (!(any(data1$norm_sd >= 1))) { df1 <- dat1 return(df1) } else { df2 <- recursive_funlp() return(df2) } } df3 <- recursive_funlp(dataset = dat1, func = funlp2) df3 } From: DIGHE, NILESH [AG/2362] Sent: Thursday, December 14, 2017 9:01 AM To: 'Eric Berger' mailto:ericjber...@gmail.com><mailto:ericjber...@gmail.com<mailto:ericjber...@gmail.com>>> Cc: r-help mailto:r-help@r-project.org><mailto:r-help@r-project.org<mailto:r-help@r-project.org>>> Subject: RE: [R] help with recursive function Eric: Thanks for taking time to look into my problem. Despite of making the change you suggested, I am still getting the same error. I am wondering if the logic I am using in the stopifnot and if functions is a problem. I like the recursive function to stop whenever the norm_sd column has zero values that are above or equal to 1. Below is the calclp function after the changes you suggested. Thanks. Nilesh dput(calclp) function (dataset) {
Re: [R] help with recursive function
Eric: I will try and see if I can figure out the issue by debugging as you suggested. I don’t know why my code after stopifnot is not getting executed where I like the code to run the funlp2 function when the if statement is TRUE but when it is false, I like it to keep running until the stopifnot condition is met. When the stopifnot condition is met, I like to get the output from if statement saved. Anyway, I will keep trying. Again, Thanks for your help! Nilesh From: Eric Berger [mailto:ericjber...@gmail.com] Sent: Thursday, December 14, 2017 10:29 AM To: DIGHE, NILESH [AG/2362] Cc: r-help Subject: Re: [R] help with recursive function If you are trying to understand why the "stopifnot" condition is met you can replace it by something like: if ( any(dat2$norm_sd >= 1) ) browser() This will put you in a debugging session where you can examine your variables, e.g. > dat$norm_sd HTH, Eric On Thu, Dec 14, 2017 at 5:33 PM, Eric Berger mailto:ericjber...@gmail.com>> wrote: The message is coming from your stopifnot() condition being met. On Thu, Dec 14, 2017 at 5:31 PM, DIGHE, NILESH [AG/2362] mailto:nilesh.di...@monsanto.com>> wrote: Hi, I accidently left out few lines of code from the calclp function. Updated function is pasted below. I am still getting the same error “Error: !(any(data1$norm_sd >= 1)) is not TRUE“ I would appreciate any help. Nilesh dput(calclp) function (dataset) { dat1 <- funlp1(dataset) recursive_funlp <- function(dataset = dat1, func = funlp2) { dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>% mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>% spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_", names(.)), funs(norm = round(scale(.), 3))) dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, sd, na.rm = TRUE), 3) dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, function(x) { max(abs(x), na.rm = TRUE) }), 3) data1 <- dat2 %>% gather(key, value, -uniqueid, -norm_max, -norm_sd) %>% separate(key, c("field_rep", "treatment"), "\\.") %>% spread(treatment, value) %>% mutate(outlier = NA) stopifnot(!(any(data1$norm_sd >= 1))) if (!(any(data1$norm_sd >= 1))) { df1 <- dat1 return(df1) } else { df2 <- recursive_funlp() return(df2) } } df3 <- recursive_funlp(dataset = dat1, func = funlp2) df3 } From: DIGHE, NILESH [AG/2362] Sent: Thursday, December 14, 2017 9:01 AM To: 'Eric Berger' mailto:ericjber...@gmail.com>> Cc: r-help mailto:r-help@r-project.org>> Subject: RE: [R] help with recursive function Eric: Thanks for taking time to look into my problem. Despite of making the change you suggested, I am still getting the same error. I am wondering if the logic I am using in the stopifnot and if functions is a problem. I like the recursive function to stop whenever the norm_sd column has zero values that are above or equal to 1. Below is the calclp function after the changes you suggested. Thanks. Nilesh dput(calclp) function (dataset) { dat1 <- funlp1(dataset) recursive_funlp <- function(dataset = dat1, func = funlp2) { dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>% mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>% spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_", names(.)), funs(norm = round(scale(.), 3))) dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, sd, na.rm = TRUE), 3) dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, function(x) { max(abs(x), na.rm = TRUE) }), 3) stopifnot(!(any(dat2$norm_sd >= 1))) if (!(any(dat2$norm_sd >= 1))) { df1 <- dat1 return(df1) } else { df2 <- recursive_funlp() return(df2) } } df3 <- recursive_funlp(dataset = dat1, func = funlp2) df3 } From: Eric Berger [mailto:ericjber...@gmail.com] Sent: Thursday, December 14, 2017 8:17 AM To: DIGHE, NILESH [AG/2362] mailto:nilesh.di...@monsanto.com>> Cc: r-help mailto:r-help@r-project.org>> Subject: Re: [R] help with recursive function My own typo ... whoops ... !( any(dat2$norm_sd >= 1 )) On Thu, Dec 14, 2017 at 3:43 PM, Eric Berger mailto:ericjber...@gmail.com>> wrote: You seem to have a typo at this expression (and some others like it) Namely, you write any(!dat2$norm_sd)
Re: [R] help with recursive function
Hi, I accidently left out few lines of code from the calclp function. Updated function is pasted below. I am still getting the same error “Error: !(any(data1$norm_sd >= 1)) is not TRUE“ I would appreciate any help. Nilesh dput(calclp) function (dataset) { dat1 <- funlp1(dataset) recursive_funlp <- function(dataset = dat1, func = funlp2) { dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>% mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>% spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_", names(.)), funs(norm = round(scale(.), 3))) dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, sd, na.rm = TRUE), 3) dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, function(x) { max(abs(x), na.rm = TRUE) }), 3) data1 <- dat2 %>% gather(key, value, -uniqueid, -norm_max, -norm_sd) %>% separate(key, c("field_rep", "treatment"), "\\.") %>% spread(treatment, value) %>% mutate(outlier = NA) stopifnot(!(any(data1$norm_sd >= 1))) if (!(any(data1$norm_sd >= 1))) { df1 <- dat1 return(df1) } else { df2 <- recursive_funlp() return(df2) } } df3 <- recursive_funlp(dataset = dat1, func = funlp2) df3 } From: DIGHE, NILESH [AG/2362] Sent: Thursday, December 14, 2017 9:01 AM To: 'Eric Berger' Cc: r-help Subject: RE: [R] help with recursive function Eric: Thanks for taking time to look into my problem. Despite of making the change you suggested, I am still getting the same error. I am wondering if the logic I am using in the stopifnot and if functions is a problem. I like the recursive function to stop whenever the norm_sd column has zero values that are above or equal to 1. Below is the calclp function after the changes you suggested. Thanks. Nilesh dput(calclp) function (dataset) { dat1 <- funlp1(dataset) recursive_funlp <- function(dataset = dat1, func = funlp2) { dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>% mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>% spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_", names(.)), funs(norm = round(scale(.), 3))) dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, sd, na.rm = TRUE), 3) dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, function(x) { max(abs(x), na.rm = TRUE) }), 3) stopifnot(!(any(dat2$norm_sd >= 1))) if (!(any(dat2$norm_sd >= 1))) { df1 <- dat1 return(df1) } else { df2 <- recursive_funlp() return(df2) } } df3 <- recursive_funlp(dataset = dat1, func = funlp2) df3 } From: Eric Berger [mailto:ericjber...@gmail.com] Sent: Thursday, December 14, 2017 8:17 AM To: DIGHE, NILESH [AG/2362] mailto:nilesh.di...@monsanto.com>> Cc: r-help mailto:r-help@r-project.org>> Subject: Re: [R] help with recursive function My own typo ... whoops ... !( any(dat2$norm_sd >= 1 )) On Thu, Dec 14, 2017 at 3:43 PM, Eric Berger mailto:ericjber...@gmail.com>> wrote: You seem to have a typo at this expression (and some others like it) Namely, you write any(!dat2$norm_sd) >= 1 when you possibly meant to write !( any(dat2$norm_sd) >= 1 ) i.e. I think your ! seems to be in the wrong place. HTH, Eric On Thu, Dec 14, 2017 at 3:26 PM, DIGHE, NILESH [AG/2362] mailto:nilesh.di...@monsanto.com>> wrote: Hi, I need some help with running a recursive function. I like to run funlp2 recursively. When I try to run recursive function in another function named "calclp" I get this "Error: any(!dat2$norm_sd) >= 1 is not TRUE". I have never built a recursive function before so having trouble executing it in this case. I would appreciate any help or guidance to resolve this issue. Please see my data and the three functions that I am using below. Please note that calclp is the function I am running and the other two functions are within this calclp function. # code: Test<- calclp(dataset = dat) # calclp function calclp<- function (dataset) { dat1 <- funlp1(dataset) recursive_funlp <- function(dataset = dat1, func = funlp2) { dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>% mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>% spread(key = field_rep, value = lp) %>% mutate_at(.vars = gr
Re: [R] help with recursive function
Eric: Thanks for taking time to look into my problem. Despite of making the change you suggested, I am still getting the same error. I am wondering if the logic I am using in the stopifnot and if functions is a problem. I like the recursive function to stop whenever the norm_sd column has zero values that are above or equal to 1. Below is the calclp function after the changes you suggested. Thanks. Nilesh dput(calclp) function (dataset) { dat1 <- funlp1(dataset) recursive_funlp <- function(dataset = dat1, func = funlp2) { dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>% mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>% spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_", names(.)), funs(norm = round(scale(.), 3))) dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, sd, na.rm = TRUE), 3) dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, function(x) { max(abs(x), na.rm = TRUE) }), 3) stopifnot(!(any(dat2$norm_sd >= 1))) if (!(any(dat2$norm_sd >= 1))) { df1 <- dat1 return(df1) } else { df2 <- recursive_funlp() return(df2) } } df3 <- recursive_funlp(dataset = dat1, func = funlp2) df3 } From: Eric Berger [mailto:ericjber...@gmail.com] Sent: Thursday, December 14, 2017 8:17 AM To: DIGHE, NILESH [AG/2362] Cc: r-help Subject: Re: [R] help with recursive function My own typo ... whoops ... !( any(dat2$norm_sd >= 1 )) On Thu, Dec 14, 2017 at 3:43 PM, Eric Berger mailto:ericjber...@gmail.com>> wrote: You seem to have a typo at this expression (and some others like it) Namely, you write any(!dat2$norm_sd) >= 1 when you possibly meant to write !( any(dat2$norm_sd) >= 1 ) i.e. I think your ! seems to be in the wrong place. HTH, Eric On Thu, Dec 14, 2017 at 3:26 PM, DIGHE, NILESH [AG/2362] mailto:nilesh.di...@monsanto.com>> wrote: Hi, I need some help with running a recursive function. I like to run funlp2 recursively. When I try to run recursive function in another function named "calclp" I get this "Error: any(!dat2$norm_sd) >= 1 is not TRUE". I have never built a recursive function before so having trouble executing it in this case. I would appreciate any help or guidance to resolve this issue. Please see my data and the three functions that I am using below. Please note that calclp is the function I am running and the other two functions are within this calclp function. # code: Test<- calclp(dataset = dat) # calclp function calclp<- function (dataset) { dat1 <- funlp1(dataset) recursive_funlp <- function(dataset = dat1, func = funlp2) { dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>% mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>% spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_", names(.)), funs(norm = round(scale(.), 3))) dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, sd, na.rm = TRUE), 3) dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, function(x) { max(abs(x), na.rm = TRUE) }), 3) stopifnot(any(!dat2$norm_sd) >= 1) if (any(!dat2$norm_sd) >= 1) { df1 <- dat1 return(df1) } else { df2 <- recursive_funlp() return(df2) } } df3 <- recursive_funlp(dataset = dat1, func = funlp2) df3 } # funlp1 function funlp1<- function (dataset) { dat2 <- dataset %>% select(field, set, ent_num, rep_num, lp) %>% unite(uniqueid, set, ent_num, sep = ".") %>% unite(field_rep, field, rep_num) %>% mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>% spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_", names(.)), funs(norm = round(scale(.), 3))) dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, sd, na.rm = TRUE), 3) dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, function(x) { max(abs(x), na.rm = TRUE) }), 3) data1 <- dat2 %>% gather(key, value, -uniqueid, -norm_max, -norm_sd) %>% separate(key, c("field_rep", "treatment"), "\\.") %>% spread(treatment, value) %>% mutate(outlier = NA) df_clean <- with(data1, data1[norm_sd < 1, ]) datD <
[R] help with recursive function
Hi, I need some help with running a recursive function. I like to run funlp2 recursively. When I try to run recursive function in another function named "calclp" I get this "Error: any(!dat2$norm_sd) >= 1 is not TRUE". I have never built a recursive function before so having trouble executing it in this case. I would appreciate any help or guidance to resolve this issue. Please see my data and the three functions that I am using below. Please note that calclp is the function I am running and the other two functions are within this calclp function. # code: Test<- calclp(dataset = dat) # calclp function calclp<- function (dataset) { dat1 <- funlp1(dataset) recursive_funlp <- function(dataset = dat1, func = funlp2) { dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>% mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>% spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_", names(.)), funs(norm = round(scale(.), 3))) dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, sd, na.rm = TRUE), 3) dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, function(x) { max(abs(x), na.rm = TRUE) }), 3) stopifnot(any(!dat2$norm_sd) >= 1) if (any(!dat2$norm_sd) >= 1) { df1 <- dat1 return(df1) } else { df2 <- recursive_funlp() return(df2) } } df3 <- recursive_funlp(dataset = dat1, func = funlp2) df3 } # funlp1 function funlp1<- function (dataset) { dat2 <- dataset %>% select(field, set, ent_num, rep_num, lp) %>% unite(uniqueid, set, ent_num, sep = ".") %>% unite(field_rep, field, rep_num) %>% mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>% spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_", names(.)), funs(norm = round(scale(.), 3))) dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, sd, na.rm = TRUE), 3) dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))], 1, function(x) { max(abs(x), na.rm = TRUE) }), 3) data1 <- dat2 %>% gather(key, value, -uniqueid, -norm_max, -norm_sd) %>% separate(key, c("field_rep", "treatment"), "\\.") %>% spread(treatment, value) %>% mutate(outlier = NA) df_clean <- with(data1, data1[norm_sd < 1, ]) datD <- with(data1, data1[norm_sd >= 1, ]) s <- split(datD, datD$uniqueid) sdf <- lapply(s, function(x) { data.frame(x, x$outlier <- ifelse(is.na(x$lp_norm), NA, ifelse(abs(x$lp_norm) == x$norm_max, "yes", "no")), x$lp <- with(x, ifelse(outlier == "yes", NA, lp))) x }) sdf2 <- bind_rows(sdf) all_dat <- bind_rows(df_clean, sdf2) all_dat } # funlp2 function funlp2<-function (dataset) { data1 <- dataset df_clean <- with(data1, data1[norm_sd < 1, ]) datD <- with(data1, data1[norm_sd >= 1, ]) s <- split(datD, datD$uniqueid) sdf <- lapply(s, function(x) { data.frame(x, x$outlier <- ifelse(is.na(x$lp_norm), NA, ifelse(abs(x$lp_norm) == x$norm_max, "yes", "no")), x$lp <- with(x, ifelse(outlier == "yes", NA, lp))) x }) sdf2 <- bind_rows(sdf) all_dat <- bind_rows(df_clean, sdf2) all_dat } # dataset dput(dat) structure(list(field = c("LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "S
Re: [R] dynamically create columns using a function
Hi Elie, Thanks for your time and efforts. I plugged in the calculation I wanted to do in the code you provided and got I wanted perfectly. Below is the solution to my original problem. # dataset dem<- structure(list(id = c("L1", "L2", "L3", "M1", "M2", "M3"), TEST_SET_NAME = c("A", "A", "A", "B", "B", "B"), YLD_BE_REG1 = c(1467L, 1455L, 1382L, 1463L, 1466L, 1455L), YLD_BE_REG2 = c(1501L, 1441L, 1421L, 1482L, 1457L, 1490L), IS_GG = c("NO", "NO", "YES", "NO", "NO", "YES" )), .Names = c("id", "TEST_SET_NAME", "YLD_BE_REG1", "YLD_BE_REG2", "IS_GG"), class = "data.frame", row.names = c(NA, -6L)) # function calc_gg calc_gg<- function (dataset, col, col_name) { mutate_call = lazyeval::interp(~round(((a - mean(a[IS_GG == "YES"], na.rm = TRUE))/mean(a[IS_GG == "YES"], na.rm = TRUE)) * 100, 1), a = as.name(col)) dataset %>% group_by(TEST_SET_NAME) %>% mutate_(.dots = setNames(list(mutate_call), col_name)) %>% ungroup() } # function f f<- function (dat, blup_datacut) { col_name_gg <- paste("GG", blup_datacut, sep = "_") col_mean_gg <- paste("YLD_BE", blup_datacut, sep = "_") dat2 <- calc_gg(dataset = dat, col = col_mean_gg, col_name = col_name_gg) dat2 } # function demo_fn demo_fn<- function (dat, f, blup_datacut) { for (i in blup_datacut) { dat <- f(dat, i) } dat } # get expected results by applying functions demo_fn(dem, f, c("REG1", "REG2")) Best Regards, Nilesh From: Elie Canonici Merle [mailto:elie.canonicime...@gmail.com] Sent: Friday, July 21, 2017 3:44 AM To: DIGHE, NILESH [AG/2362] Cc: r-help@r-project.org Subject: Re: [R] dynamically create columns using a function Hi, I don't know about the lazyeval package or what you are trying to do but to answer the main question "How to create columns dynamically using a function?" I would do something like that: # dataset dem <- structure(list(id = c("L1", "L2", "L3", "M1", "M2", "M3"), TEST_SET_NAME = c("A", "A", "A", "B", "B", "B"), YLD_BE_REG1 = c(1467L, 1455L, 1382L, 1463L, 1466L, 1455L), YLD_BE_REG2 = c(1501L, 1441L, 1421L, 1482L, 1457L, 1490L), IS_GG = c("NO", "NO", "YES", "NO", "NO", "YES" )), .Names = c("id", "TEST_SET_NAME", "YLD_BE_REG1", "YLD_BE_REG2", "IS_GG"), class = "data.frame", row.names = c(NA, -6L)) demo_fn<- function (data, f, names) { for (i in names) { data <- f(data, i) } data } f <- function(data, name) { col_work <- paste("YLD_BE", name, sep = "_") col_name_result <- paste("GG", name, sep = "_") #do something interesting, here I am simply copying the column data[col_name_result] <- data[col_work] data } demo_fn(dem, f, c("REG1", "REG2")) If you are working with large datasets it might not be the best solution as my understanding is that this method involves a lot of copying. Hope it helps, Elie Canonici Merle 2017-07-20 17:55 GMT+02:00 DIGHE, NILESH [AG/2362] mailto:nilesh.di...@monsanto.com>>: Hi, I am writing a function to dynamically create column names and fill those columns with some basic calculations. My function "demo_fn" takes argument "blup_datacut" and I like to use the contents of those arguments to dynamically create new columns in my dataset. Please note that I have another function called "calc_gg" within the function "demo_fn". Both functions are pasted below. I have a for loop within my function and it appears to only create new column for the last value in the argument "blup_datacut" which makes me think that I am not storing the values coming out of for_loop correctly. I have "expected_results", dataset, & functions pasted below to reproduce my problem and expected results. Any help will be greatly appreciate. # dataset dem<- structure(list(id = c("L1", "L2", "L3", "M1", "M2", "M3"), TEST_SET_NAME = c("A", "A", "A", "B", "B", "B"), YLD_BE_REG1 = c(1467L, 1455L, 1382L, 1463L, 1466L, 1455L), YLD_BE_REG2 = c(1501L, 1441L, 1421L, 1482L, 1457L, 1490L), IS_GG = c("NO", "NO", "YES", "NO", "NO", "YES" )), .Names = c("id&quo
[R] dynamically create columns using a function
Hi, I am writing a function to dynamically create column names and fill those columns with some basic calculations. My function "demo_fn" takes argument "blup_datacut" and I like to use the contents of those arguments to dynamically create new columns in my dataset. Please note that I have another function called "calc_gg" within the function "demo_fn". Both functions are pasted below. I have a for loop within my function and it appears to only create new column for the last value in the argument "blup_datacut" which makes me think that I am not storing the values coming out of for_loop correctly. I have "expected_results", dataset, & functions pasted below to reproduce my problem and expected results. Any help will be greatly appreciate. # dataset dem<- structure(list(id = c("L1", "L2", "L3", "M1", "M2", "M3"), TEST_SET_NAME = c("A", "A", "A", "B", "B", "B"), YLD_BE_REG1 = c(1467L, 1455L, 1382L, 1463L, 1466L, 1455L), YLD_BE_REG2 = c(1501L, 1441L, 1421L, 1482L, 1457L, 1490L), IS_GG = c("NO", "NO", "YES", "NO", "NO", "YES" )), .Names = c("id", "TEST_SET_NAME", "YLD_BE_REG1", "YLD_BE_REG2", "IS_GG"), class = "data.frame", row.names = c(NA, -6L)) # function demo_fn demo_fn<- function (dat, blup_datacut = c("REG1", "REG2")) { for (i in seq_along(blup_datacut)) { col_name_gg <- paste("GG", blup_datacut[i], sep = "_") col_mean_gg <- paste("YLD_BE", blup_datacut[i], sep = "_") dat2 <- calc_gg(dataset = dat, col = col_mean_gg, col_name = col_name_gg) } dat2 } # function calc_gg Calc_gg<- function (dataset, col, col_name) { mutate_call = lazyeval::interp(~round(((a - mean(a[IS_GG == "YES"], na.rm = TRUE))/mean(a[IS_GG == "YES"], na.rm = TRUE)) * 100, 1), a = as.name(col)) dataset %>% group_by(TEST_SET_NAME) %>% mutate_(.dots = setNames(list(mutate_call), col_name)) %>% ungroup() } # run function results_demo<- demo_fn(dat = dem) # expected results structure(list(id = c("L1", "L2", "L3", "M1", "M2", "M3"), TEST_SET_NAME = c("A", "A", "A", "B", "B", "B"), YLD_BE_REG1 = c(1467L, 1455L, 1382L, 1463L, 1466L, 1455L), YLD_BE_REG2 = c(1501L, 1441L, 1421L, 1482L, 1457L, 1490L), IS_GG = c("NO", "NO", "YES", "NO", "NO", "YES" ), GG_REG1 = c(6.2, 5.3, 0, 0.5, 0.8, 0), GG_REG2 = c(5.6, 1.4, 0, -0.5, -2.2, 0)), .Names = c("id", "TEST_SET_NAME", "YLD_BE_REG1", "YLD_BE_REG2", "IS_GG", "GG_REG1", "GG_REG2"), row.names = c(NA, -6L), class = "data.frame") Thanks. Nilesh This email and any attachments were sent from a Monsanto email account and may contain confidential and/or privileged information. If you are not the intended recipient, please contact the sender and delete this email and any attachments immediately. Any unauthorized use, including disclosing, printing, storing, copying or distributing this email, is prohibited. All emails and attachments sent to or from Monsanto email accounts may be subject to monitoring, reading, and archiving by Monsanto, including its affiliates and subsidiaries, as permitted by applicable law. Thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why is merge sorting even when sort = F?
Using the "join" function from the plyr package preserves the data order library(plyr) join(grades2, info, by="grade", type="left", match="all") Nilesh -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Dimitri Liakhovitski Sent: Wednesday, March 08, 2017 12:45 PM To: Jeff Newmiller Cc: r-help Subject: Re: [R] Why is merge sorting even when sort = F? I understood your answer. The point is that sort = TRUE that doesn't sort is plain confusing. Instead, the option should have been something like efficient = TRUE or FALSE. At least then no one would stupidly expect sort = TRUE to sort and sort = FALSE to NOT sort. On Wed, Mar 8, 2017 at 12:51 PM, Jeff Newmiller wrote: > If you are still wondering, try re-reading my answer. FALSE is more > efficient, TRUE is sorted. Lack of sorting has nothing to do with preserving > order. > -- > Sent from my phone. Please excuse my brevity. > > On March 8, 2017 8:55:06 AM PST, Dimitri Liakhovitski > wrote: >>Thank you. I was just curious what sort=FALSE had no impact. >>Wondering what it is there for then... >> >>On Wed, Mar 8, 2017 at 11:43 AM, Jeff Newmiller >> wrote: >>> Merging is not necessarily an order-preserving operation, but >>> sorting >>can make the operation more efficient. The sort=TRUE argument forces >>the result to be sorted, but sort=FALSE is in not a promise that order >>will be preserved. (I think the imperfect sorting occurs when there >>are multiple keys but am not sure.) You can add columns to the input >>data that let you restore some semblance of the original ordering >>afterward, or you can roll your own possibly-less-efficient merge >>using match and >>indexing: >>> >>> info[ match( grades2$grade, info$grade ), ] >>> -- >>> Sent from my phone. Please excuse my brevity. >>> >>> On March 8, 2017 8:07:27 AM PST, Dimitri Liakhovitski >> wrote: Hello! I have a vector 'grades' and a data frame 'info': grades2 <- data.frame(grade = c(1,2,2,3,1)) info <- data.frame( grade = 3:1, desc = c("Excellent", "Good", "Poor"), fail = c(F, F, T) ) I want to get the info for all grades I have in info: This solution resorts everything in the order of column 'grade': merge(grades2, info, by = "grade", all.x = T, all.y = F) Could you please explain why this solution also resorts - despite >>sort = FALSE? merge(grades2, info, by = "grade", all.x = T, all.y = F, sort = >>FALSE) Thanks a lot! -- Dimitri Liakhovitski __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This email and any attachments were sent from a Monsanto email account and may contain confidential and/or privileged information. If you are not the intended recipient, please contact the sender and delete this email and any attachments immediately. Any unauthorized use, including disclosing, printing, storing, copying or distributing this email, is prohibited. All emails and attachments sent to or from Monsanto email accounts may be subject to monitoring, reading, and archiving by Monsanto, including its affiliates and subsidiaries, as permitted by applicable law. Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Understanding and predict round-off errors sign on simple functions
Using "runmean" function from caTools package within your SMA function appears to solve the issue. Please see details below. library(caTools) > dput(m) structure(c(-0.626453810742332, 0.183643324222082, -0.835628612410047, 1.59528080213779, 0.329507771815361, -0.820468384118015, 0.487429052428485, 0.738324705129217, 0.575781351653492, -0.305388387156356, 3.51178116845085, 2.38984323641143, 1.3787594194582, -0.2146998871775, 3.12493091814311, 1.95506639098477, 1.98380973690105, 2.9438362106853, 2.82122119509809, 2.59390132121751, 5.91897737160822, 5.78213630073107, 5.07456498336519, 3.01064830413663, 5.61982574789471, 4.943871260471, 4.84420449329467, 3.52924761610073, 4.52184994489138, 5.4179415601997), .Dim = c(10L, 3L)) > dput(SMA) function (x, n = 10, ...) { ma <- runmean(x, n) if (!is.null(dim(ma))) { colnames(ma) <- "SMA" } return(ma) } mma <- apply(m, 2, SMA, n=1) results<-mma-m > dput(results) structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(10L, 3L)) Nilesh -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Marc Schwartz Sent: Wednesday, June 29, 2016 1:07 PM To: Bert Gunter Cc: R-help Subject: Re: [R] Understanding and predict round-off errors sign on simple functions Hi, Just to augment Bert's comments, I presume that you are aware of the relevant R FAQ: https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f That you had an expectation of the difference being 0 suggested to me that you might not be, but my apologies if that is not the case. That being said, there are some higher precision CRAN packages that may offer some additional functionality, with the potential limitations that Bert references below. More information is available in the Numerical Mathematics CRAN Task View: https://cran.r-project.org/web/views/NumericalMathematics.html In addition, with the caveat that I have not used it, there is the 'propagate' package on CRAN that may be relevant to what you want to be able to anticipate, at some level: https://cran.r-project.org/web/packages/propagate/index.html It has not been updated in a while and there are some notes for the CRAN package checks, that suggest that the maintainer may not be active at this point. Regards, Marc > On Jun 29, 2016, at 10:13 AM, Bert Gunter wrote: > > I am certainly no expert, but I would assume that: > > 1. Roundoff errors depend on the exact numerical libraries and > versions that are used, and so general language comparisons are > impossible without that information; > > 2. Roundoff errors depend on the exact calculations being done and > machine precision and are very complicated to determine > > So I would say the answer to your questions is no. > > But you should probably address such a question to a numerical analyst > for an authoritative answer. Maybe try stats.stackexchange.com . > > -- Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Wed, Jun 29, 2016 at 2:55 AM, Sirhc via R-help > wrote: >> Hi, >> >> >> >> May be it is a basic thing but I would like to know if we can >> anticipate round-off errors sign. >> >> >> >> Here is an example : >> >> >> >> # numerical matrix >> >> m <- matrix(data=cbind(rnorm(10, 0), rnorm(10, 2), rnorm(10, 5)), >> nrow=10, >> ncol=3) >> >> >> >>> m >> >>[,1] [,2] [,3] >> >> [1,] 0.4816247 1.1973502 3.855641 >> >> [2,] -1.2174937 0.7356427 4.393279 >> >> [3,] 0.8504074 2.5286509 2.689196 >> >> [4,] 1.8048642 1.8580804 6.665237 >> >> [5,] -0.6749397 1.0944277 4.838608 >> >> [6,] 0.8252034 1.5595268 3.681695 >> >> [7,] 1.3002208 0.9582693 4.561577 >> >> [8,] 1.6950923 3.5677921 6.005078 >> >> [9,] 0.6509285 0.9025964 5.082288 >> >> [10,] -0.5676040 1.3281102 4.446451 >> >> >> >> #weird moving average of period 1 ! >> >> mma <- apply(m, 2, SMA, n=1) >> >> >> >>> mma >> >>[,1] [,2] [,3] >> >> [1,] NANA NA >> >> [2,] -1.2174937 0.7356427 4.393279 >> >> [3,] 0.8504074 2.5286509 2.689196 >> >> [4,] 1.8048642 1.8580804 6.665237 >> >> [5,] -0.6749397 1.0944277 4.838608 >> >> [6,] 0.8252034 1.5595268 3.681695 >> >> [7,] 1.3002208 0.9582693 4.561577 >> >> [8,] 1.6950923 3.5677921 6.005078 >> >> [9,] 0.6509285 0.9025964 5.082288 >> >> [10,] -0.5676040 1.3281102 4.446451 >> >> >> >> >> >> #difference should be 0 but here is the result >> >>> m - mma >> >> [,1] [,2] [,3] >> >> [1,]NA NANA >> >> [2,] 0.00e+00 0.00e+00 -8.881784e-16 >> >> [3,] 0.00e+00 0.00e+00 -8.881784e-16 >> >> [4,] 0.00e+00 4.440892e-16 -8.881784e-16 >> >> [
Re: [R] sample within a loop
Tanvir & Don: Thanks a lot for your solutions. Both solutions work great. I really appreciate your help. Regards, Nilesh -Original Message- From: Mohammad Tanvir Ahamed [mailto:mashra...@yahoo.com] Sent: Thursday, March 17, 2016 1:24 PM To: DIGHE, NILESH [AG/2362]; r-help@r-project.org Subject: Re: [R] sample within a loop Hi, you can try df1<-split(df,df$groups) lapply(df1, function(x) { x<-cbind(x,entry=0) sam <- sample(x$plotno,1) x$entry[which(x$plotno==sam)]<-"CONTROL" x$entry[which(!x$plotno==sam)]<-"TEST" x } ) Tanvir Ahamed Göteborg, Sweden | mashra...@yahoo.com ____________ From: "DIGHE, NILESH [AG/2362]" To: "r-help@r-project.org" Sent: Thursday, 17 March 2016, 18:18 Subject: [R] sample within a loop Dear R users, My data frame has four "groups" namely A1, B2, C3, & D4. Each group has 12 rows (variable "plotno). I like to randomly sample one "plotno" within each "groups" variable and label it as "CONTROL" and label others as "TEST" in a new variable called "entry". I am trying to do this by looping over the group variable and then sample "plotno" within a given group. I am ending up with four "CONTROL" plots but they are generated by sampling over all the groups instead of each group. I need one random "plotno" assigned as a "CONTROL" per group (A1, B2, C3, D4). I would appreciate any help in modifying my function "funa" or suggest any alternative and better way to do this task. Below is the dataset and function I am working with. # dataset (df) structure(list(plotno = 1:48, groups = c("A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4" )), .Names = c("plotno", "groups"), row.names = c(NA, -48L), class = "data.frame") # function (funa) function (dataset) { set.seed(1) bay <- unique(dataset$groups) IND <- c() df2 <- dataset for (i in bay) { IND[i] <- which(plotno %in% sample(plotno, 1)) df2$entry <- ifelse(df2$plotno %in% IND, "CONTROL", "TEST") } df2 } # session info R version 3.2.1 (2015-06-18) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.2.1 Thanks. Nilesh Nilesh Dighe (806)-252-7492 (Cell) (806)-741-2019 (Office) This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware". Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -
[R] sample within a loop
Dear R users, My data frame has four "groups" namely A1, B2, C3, & D4. Each group has 12 rows (variable "plotno). I like to randomly sample one "plotno" within each "groups" variable and label it as "CONTROL" and label others as "TEST" in a new variable called "entry". I am trying to do this by looping over the group variable and then sample "plotno" within a given group. I am ending up with four "CONTROL" plots but they are generated by sampling over all the groups instead of each group. I need one random "plotno" assigned as a "CONTROL" per group (A1, B2, C3, D4). I would appreciate any help in modifying my function "funa" or suggest any alternative and better way to do this task. Below is the dataset and function I am working with. # dataset (df) structure(list(plotno = 1:48, groups = c("A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4" )), .Names = c("plotno", "groups"), row.names = c(NA, -48L), class = "data.frame") # function (funa) function (dataset) { set.seed(1) bay <- unique(dataset$groups) IND <- c() df2 <- dataset for (i in bay) { IND[i] <- which(plotno %in% sample(plotno, 1)) df2$entry <- ifelse(df2$plotno %in% IND, "CONTROL", "TEST") } df2 } # session info R version 3.2.1 (2015-06-18) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.2.1 Thanks. Nilesh Nilesh Dighe (806)-252-7492 (Cell) (806)-741-2019 (Office) This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware". Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] store results from loop into a dataframe
Sarah: Thanks a lot for taking time to guide me to the right direction. I now see how the missing data is causing the problem. Thanks again! Nilesh -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Tuesday, January 05, 2016 12:13 PM To: DIGHE, NILESH [AG/2362] Cc: r-help@r-project.org Subject: Re: [R] store results from loop into a dataframe If you run each variable individually, you'll discover that the NAs in your data are causing problems. It's up to you to figure out what the best way to handle those missing values for your research is. Sarah On Tue, Jan 5, 2016 at 12:39 PM, DIGHE, NILESH [AG/2362] wrote: > Sarah: Thanks for pointing out the errors in my function. > > Below are the errors I am getting after I run the corrected quote: > Error in if (s) { : missing value where TRUE/FALSE needed In addition: > Warning message: > In qtukey(1 - alpha, ntr, DFerror) : NaNs produced > > You are right, I have no idea to handle these errors. > > Do you recommend any other approach to solve my problem? > > Thanks for your time. > Nilesh > > > > -Original Message- > From: Sarah Goslee [mailto:sarah.gos...@gmail.com] > Sent: Tuesday, January 05, 2016 11:20 AM > To: DIGHE, NILESH [AG/2362] > Cc: r-help@r-project.org > Subject: Re: [R] store results from loop into a dataframe > > Leaving aside the question of whether this is the best way to approach your > problem (unlikely), there's a couple of errors in your code involving > indexing. Once fixed, the code demonstrates some errors in your use of > HSD.test that will be harder for you to deal with. > > Thanks for the complete reproducible example. > > fun2 <- function (x) > > { > > trait_names <- c("yield", "lp", "lnth") > > d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0, > > 6)) > > for (i in trait_names) { > # your formula has all the trait names, not the selected one > # mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + > PEDIGREE*FIELD + FIELD%in%REP")), data = x) > mod <- aov(formula(paste(i, "~ PEDIGREE + FIELD + > PEDIGREE*FIELD + FIELD%in%REP")), data = x) > > out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = > FALSE) > > # you're indexing by the trait name, instead of its position > # d[, i] <- out$means[, 1] > d[, which(trait_names == i)] <- out$means[, 1] > > } > > d > > } > > Sarah > > On Tue, Jan 5, 2016 at 11:48 AM, DIGHE, NILESH [AG/2362] > wrote: >> Dear R users: >> >> I am trying to create a function that will loop over three dependent >> variables in my aov model, and then get the HSD.test for each variable. I >> like to store the results from each loop in a data frame. >> >> >> >> When I run my function (funx) on my data (dat), results from only yield gets >> populated in all three columns of the dataframe. I am not able to store the >> results for each variable in a dataframe. Any help will be highly >> appreciated. >> >> >> >> >> >> >> >> function (x) >> >> { >> >> trait_names <- c("yield", "lp", "lnth") >> >> d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0, >> >> 6)) >> >> for (i in trait_names) { >> >> mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + >> PEDIGREE*FIELD + FIELD%in%REP")), >> >> data = x) >> >> out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = >> FALSE) >> >> d[, i] <- out$means[, 1] >> >> } >> >> d >> >> } >> >> >> structure(list(FIELD = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, >> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, >> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, >> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, >> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, >> 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, >> 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = >> c("FYLS", "HKI1", "KIS1", "LMLS", "SELS", "SGL1"), class = "factor"), >> REP = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, >> 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
Re: [R] store results from loop into a dataframe
Sarah: Thanks for pointing out the errors in my function. Below are the errors I am getting after I run the corrected quote: Error in if (s) { : missing value where TRUE/FALSE needed In addition: Warning message: In qtukey(1 - alpha, ntr, DFerror) : NaNs produced You are right, I have no idea to handle these errors. Do you recommend any other approach to solve my problem? Thanks for your time. Nilesh -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Tuesday, January 05, 2016 11:20 AM To: DIGHE, NILESH [AG/2362] Cc: r-help@r-project.org Subject: Re: [R] store results from loop into a dataframe Leaving aside the question of whether this is the best way to approach your problem (unlikely), there's a couple of errors in your code involving indexing. Once fixed, the code demonstrates some errors in your use of HSD.test that will be harder for you to deal with. Thanks for the complete reproducible example. fun2 <- function (x) { trait_names <- c("yield", "lp", "lnth") d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0, 6)) for (i in trait_names) { # your formula has all the trait names, not the selected one # mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + PEDIGREE*FIELD + FIELD%in%REP")), data = x) mod <- aov(formula(paste(i, "~ PEDIGREE + FIELD + PEDIGREE*FIELD + FIELD%in%REP")), data = x) out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = FALSE) # you're indexing by the trait name, instead of its position # d[, i] <- out$means[, 1] d[, which(trait_names == i)] <- out$means[, 1] } d } Sarah On Tue, Jan 5, 2016 at 11:48 AM, DIGHE, NILESH [AG/2362] wrote: > Dear R users: > > I am trying to create a function that will loop over three dependent > variables in my aov model, and then get the HSD.test for each variable. I > like to store the results from each loop in a data frame. > > > > When I run my function (funx) on my data (dat), results from only yield gets > populated in all three columns of the dataframe. I am not able to store the > results for each variable in a dataframe. Any help will be highly appreciated. > > > > > > > > function (x) > > { > > trait_names <- c("yield", "lp", "lnth") > > d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0, > > 6)) > > for (i in trait_names) { > > mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + > PEDIGREE*FIELD + FIELD%in%REP")), > > data = x) > > out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = > FALSE) > > d[, i] <- out$means[, 1] > > } > > d > > } > > > structure(list(FIELD = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, > 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, > 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, > 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = > c("FYLS", "HKI1", "KIS1", "LMLS", "SELS", "SGL1"), class = "factor"), > REP = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, > 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, > 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, > 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, > 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, > 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, > 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1", "2", "3"), > class = "factor"), PEDIGREE = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, > 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, > 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, > 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, > 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, > 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, > 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L), > .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), yield = > c(1003L, 923L, 1268L, 1226L, 1059L, 1150L, 900L, 816L, 1072L, 1158L, > 1026L, 1299L, 1083L, 1038L, 1236L, 12
[R] store results from loop into a dataframe
Dear R users: I am trying to create a function that will loop over three dependent variables in my aov model, and then get the HSD.test for each variable. I like to store the results from each loop in a data frame. When I run my function (funx) on my data (dat), results from only yield gets populated in all three columns of the dataframe. I am not able to store the results for each variable in a dataframe. Any help will be highly appreciated. function (x) { trait_names <- c("yield", "lp", "lnth") d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0, 6)) for (i in trait_names) { mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + PEDIGREE*FIELD + FIELD%in%REP")), data = x) out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = FALSE) d[, i] <- out$means[, 1] } d } structure(list(FIELD = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("FYLS", "HKI1", "KIS1", "LMLS", "SELS", "SGL1"), class = "factor"), REP = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1", "2", "3"), class = "factor"), PEDIGREE = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), yield = c(1003L, 923L, 1268L, 1226L, 1059L, 1150L, 900L, 816L, 1072L, 1158L, 1026L, 1299L, 1083L, 1038L, 1236L, 1287L, 1270L, 1612L, 1513L, 1676L, 1504L, 1417L, 1932L, 1644L, 1293L, 1542L, 1452L, 1180L, 1248L, 1764L, 1326L, 1877L, 1788L, 1606L, 1809L, 1791L, 2294L, 2315L, 2320L, 2083L, 1895L, 2284L, 2000L, 2380L, 1952L, 2414L, 2354L, 2095L, 2227L, 2093L, 2019L, 2505L, 2410L, 2287L, 2507L, 2507L, 2349L, 2162L, 2108L, 2319L, 2028L, 1947L, 2352L, 2698L, 2369L, 1798L, 2422L, 2509L, 2234L, 2451L, 2139L, 1957L, 799L, 787L, 701L, 781L, 808L, 582L, 770L, 752L, 801L, 865L, 608L, 620L, 677L, 775L, 722L, 1030L, 606L, 729L, 1638L, 1408L, 1045L, 1685L, 1109L, 1210L, 1419L, 1048L, 1129L, 1549L, 1325L, 1315L, 1838L, 1066L, 1295L, 1499L, 1472L, 1139L), lp = c(NA, NA, 46.31, NA, NA, 43.8, NA, NA, 43.91, NA, NA, 44.47, NA, NA, 45.16, NA, NA, 43.57, 40.65, NA, NA, 40.04, NA, NA, 41.33, NA, NA, 40.75, NA, NA, 42.04, NA, NA, 40.35, NA, NA, 43.682, NA, NA, 41.712, NA, NA, 42.566, NA, NA, 43.228, NA, NA, 43.63, NA, NA, 42.058, NA, NA, NA, 45.19, NA, NA, 41.91, NA, NA, 43.86, NA, NA, 44.48, NA, NA, 44.34, NA, NA, 43.03, NA, NA, NA, 44.08, NA, NA, 41.39, NA, NA, 42.48, NA, NA, 44.13, NA, NA, 43.39, NA, NA, 42.82, 42.18, NA, NA, 41.42, NA, NA, 41.25, NA, NA, 42.31, NA, NA, 43.22, NA, NA, 40.52, NA, NA), lnth = c(NA, NA, 1.151, NA, NA, 1.135, NA, NA, 1.109, NA, NA, 1.117, NA, NA, 1.107, NA, NA, 1.196, 1.255, NA, NA, 1.229, NA, NA, 1.158, NA, NA, 1.214, NA, NA, 1.152, NA, NA, 1.194, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1.2, NA, NA, 1.219, NA, NA, 1.115, NA, NA, 1.205, NA, NA, 1.238, NA, NA, 1.244, NA, NA, NA, 1.096, NA, NA, 1.021, NA, NA, 1.055, NA, NA, 1.058, NA, NA, 1.026, NA, NA, 1.115, 1.202, NA, NA, 1.161, NA, NA, 1.168, NA, NA, 1.189, NA, NA, 1.204, NA, NA, 1.277, NA, NA)), .Names = c("FIELD", "REP", "PEDIGREE", "yield", "lp", "lnth" ), row.names = c(NA, -108L), class = "data.frame") R version 3.2.1 (2015-06-18) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] agricolae_1.2-1 asreml_3.0 lattice_0.20-31 ggplot2_1.0.1 dplyr_0.4.2 plyr_1.8.3 loaded via a namespace (and not attached): [1] spdep_0.5
Re: [R] subset data using a vector
Jim & Michael: I really appreciate your guidance in creating the function I wanted. I took suggestions from both of you and was able to complete this function. I had to split the process into two functions as listed below. I just thought to send the results to the list in case someone might be interested in doing similar task in the future. Thanks. Nilesh getcheckmeans<- function (dataset) { row_check_mean <- c() dat1 <- data.frame() check_mean <- c() x <- length(dataset$plotid) for (i in (1:x)) { r1 <- dataset[i, 1] r2 <- r1 - 1 r3 <- r1 + 1 r4 <- c(r1, r2, r3) dat1 <- split(dataset, dataset$rows %in% r4)[[2]] row_check_mean[i] <- tapply(dat1$yield, dat1$linecode, mean, na.rm = TRUE)[1] check_mean <- round(unlist(row_check_mean)[1:x], digits = 2) } check_mean } adjustdata<- function (dataset, trait, control) { check_mean <- getcheckmeans(dataset) dat_check_mean <- as.data.frame(check_mean) dataset <- cbind(dataset, dat_check_mean) adj_yield <- c() x <- length(trait) for (i in 1:x) { adj_yield[i] <- ifelse(control[i] == "variety", round(trait[i]/dataset$check_mean[i], digits = 3), round(trait[i]/trait[i], digits = 3)) } data.frame(dataset, adj_yield) } dat<- structure(list(rows = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), cols = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 16L, 15L, 14L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 16L, 15L, 14L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L), plotid = c(289L, 290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 298L, 299L, 300L, 301L, 302L, 303L, 304L, 369L, 370L, 371L, 372L, 373L, 374L, 375L, 376L, 377L, 378L, 379L, 380L, 381L, 382L, 383L, 384L, 385L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L, 394L, 395L, 396L, 397L, 398L, 399L, 400L, 465L, 466L, 467L, 468L, 469L, 470L, 471L, 472L, 473L, 474L, 475L, 476L, 477L, 478L, 479L, 480L), yield = c(5.1, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5, 5, 3.9, 4.6, 5, 4.4, 5.1, 4.3, 4.4, 4.2, 3.9, 4.6, 4.8, 5.4, 4.7, 5.5, 5.3, 4.8, 5.8, 4.6, 5.8, 5.5, 5.3, 5.6, 5.6, 5, 4.8, 4.9, 5.2, 5.3, 4.6, 4.8, 5.3, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5, 5.4, 4.8, 4.6, 5.2, 4.9, 5.1, 4.5, 5.8, 5.2, 4.7, 4.8, 5.3, 5.8, 4.9, 5.9, 4.5), line = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 21L, 22L, 1L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 1L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 1L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 1L, 51L, 52L, 53L, 54L, 1L, 55L, 56L, 57L), .Label = c("CHK", "V002", "V003", "V004", "V005", "V006", "V007", "V008", "V009", "V010", "V011", "V012", "V013", "V014", "V015", "V016", "V017", "V018", "V019", "V020", "V021", "V022", "V023", "V024", "V025", "V026", "V027", "V028", "V029", "V030", "V031", "V032", "V033", "V034", "V035", "V036", "V037", "V038", "V039", "V040", "V041", "V042", "V043", "V044", "V045", "V046", "V047", "V048", "V049", "V050", "V051", "V052", "V053", "V054", "V055", "V056", "V057"), class = "factor"), linecode = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("check", "variety"), class = "factor")), .Names = c("rows", "cols", "plotid", "yield", "line", "linecode"), class = "data.frame", row.names = c(NA, -64L)) From: Jim Lemon [mailto:drjimle...@gmail.com] Sent: Tuesday, November 24, 2015 2:53 AM To: DIGHE, NILESH [AG/2362] Cc: r-help@r-project.org Subject: Re: [R] subset data using a vector Hi Nilesh, I simplified your code a bit: fun1<-function (dataset, plot.id<http://plot.id>, ranges2use, cont
Re: [R] subset data using a vector
Michael: I tried using your suggestion of using length and still get the same error: Error in m1[[i]] : subscript out of bounds I also checked the length of m1 and x and they both are of same length (64). After trying several things, I was able to extract the list but this was done outside the function I am trying to create. Code that worked is listed below: for(i in (1:length(mydata$plotid))){ v1<-as.numeric(strsplit(as.character(mydata$rangestouse), ",")[[i]]) print(head(v1))} However, when I try to get this code in a function (fun3) listed below, I get the following error: Error in strsplit(as.character(dataset$ranges2use), ",")[[i]] : subscript out of bounds fun3<- function (dataset, plot.id, ranges2use, control) { m1 <- c() x <- length(plot.id) for (i in (1:x)) { m1 <- as.numeric(strsplit(as.character(dataset$ranges2use), ",")[[i]]) } m2 } I am not sure where I am making a mistake. Thanks. Nilesh -Original Message- From: Michael Dewey [mailto:li...@dewey.myzen.co.uk] Sent: Monday, November 23, 2015 12:11 PM To: DIGHE, NILESH [AG/2362]; r-help@r-project.org Subject: Re: [R] subset data using a vector Try looking at your function and work through what happens if the length is what I suggested. >> x <- length(plot.id) >> >> for (i in (1:x)) { >> >> m2[i] <- m1[[i]] So unless m1 has length at least x you are doomed. On 23/11/2015 16:26, DIGHE, NILESH [AG/2362] wrote: > Michael: I like to use the actual range id's listed in column "rangestouse" > to subset my data and not the length of that vector. > > Thanks. > Nilesh > > -Original Message- > From: Michael Dewey [mailto:li...@dewey.myzen.co.uk] > Sent: Monday, November 23, 2015 10:17 AM > To: DIGHE, NILESH [AG/2362]; r-help@r-project.org > Subject: Re: [R] subset data using a vector > > length(strsplit(as.character(mydata$ranges2use), ",")) > > was that what you expected? I think not. > > On 23/11/2015 16:05, DIGHE, NILESH [AG/2362] wrote: >> Dear R users, >> I like to split my data by a vector created by using >> variable "ranges". This vector will have the current range (ranges), >> preceding range (ranges - 1), and post range (ranges + 1) for a given >> plotid. If the preceding or post ranges in this vector are outside the >> levels of ranges in the data set then I like to drop those ranges and only >> include the ranges that are available. Variable "rangestouse" includes all >> the desired ranges I like to subset a given plotid. After I subset these >> dataset using these desired ranges, then I like to extract the yield data >> for checks in those desired ranges and adjust yield of my data by dividing >> yield of a given plotid with the check average for the desired ranges. >> >> I have created this function (fun1) but when I run it, I get the following >> error: >> >> Error in m1[[i]] : subscript out of bounds >> >> Any help will be highly appreciated! >> Thanks, Nilesh >> >> Dataset: >> dput(mydata) >> structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, >> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, >> 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, >> 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, >> 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), >> cols = structure(c(1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, >> 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, >> 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, >> 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, >> 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Label = c("1", "2", "3", "4", "5", >> "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16"), class = >> "factor"), >> plotid = c(289L, 298L, 299L, 300L, 301L, 302L, 303L, 304L, >> 290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 384L, 375L, >> 374L, 373L, 372L, 371L, 370L, 369L, 383L, 382L, 381L, 380L, >> 379L, 378L, 377L, 376L, 385L, 394L, 395L, 396L, 397L, 398L, >> 399L, 400L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L, >> 480L, 471L, 470L, 469L, 468L, 467L, 466L, 465L, 479L, 478L, >> 477L, 476L, 475L, 474L, 473L, 472L), yield = c(5.1, 5, 3.
Re: [R] subset data using a vector
Michael: I like to use the actual range id's listed in column "rangestouse" to subset my data and not the length of that vector. Thanks. Nilesh -Original Message- From: Michael Dewey [mailto:li...@dewey.myzen.co.uk] Sent: Monday, November 23, 2015 10:17 AM To: DIGHE, NILESH [AG/2362]; r-help@r-project.org Subject: Re: [R] subset data using a vector length(strsplit(as.character(mydata$ranges2use), ",")) was that what you expected? I think not. On 23/11/2015 16:05, DIGHE, NILESH [AG/2362] wrote: > Dear R users, > I like to split my data by a vector created by using > variable "ranges". This vector will have the current range (ranges), > preceding range (ranges - 1), and post range (ranges + 1) for a given plotid. > If the preceding or post ranges in this vector are outside the levels of > ranges in the data set then I like to drop those ranges and only include the > ranges that are available. Variable "rangestouse" includes all the desired > ranges I like to subset a given plotid. After I subset these dataset using > these desired ranges, then I like to extract the yield data for checks in > those desired ranges and adjust yield of my data by dividing yield of a given > plotid with the check average for the desired ranges. > > I have created this function (fun1) but when I run it, I get the following > error: > > Error in m1[[i]] : subscript out of bounds > > Any help will be highly appreciated! > Thanks, Nilesh > > Dataset: > dput(mydata) > structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, > 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, > 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, > 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), > cols = structure(c(1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, > 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, > 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, > 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, > 5L, 6L, 7L, 8L, 9L), .Label = c("1", "2", "3", "4", "5", "6", "7", > "8", "9", "10", "11", "12", "13", "14", "15", "16"), class = "factor"), > plotid = c(289L, 298L, 299L, 300L, 301L, 302L, 303L, 304L, > 290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 384L, 375L, > 374L, 373L, 372L, 371L, 370L, 369L, 383L, 382L, 381L, 380L, > 379L, 378L, 377L, 376L, 385L, 394L, 395L, 396L, 397L, 398L, > 399L, 400L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L, > 480L, 471L, 470L, 469L, 468L, 467L, 466L, 465L, 479L, 478L, > 477L, 476L, 475L, 474L, 473L, 472L), yield = c(5.1, 5, 3.9, > 4.6, 5, 4.4, 5.1, 4.3, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5, > 5.6, 4.7, 5.4, 4.8, 4.6, 3.9, 4.2, 4.4, 5.3, 5.5, 5.8, 4.6, > 5.8, 4.8, 5.3, 5.5, 5.6, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5, > 5, 4.8, 4.9, 5.2, 5.3, 4.6, 4.8, 5.3, 4.5, 4.5, 5.1, 4.9, > 5.2, 4.6, 4.8, 5.4, 5.9, 4.9, 5.8, 5.3, 4.8, 4.7, 5.2, 5.8 > ), linecode = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, > 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, > 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("check", > "variety"), class = "factor"), ranges = c(1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, > 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L > ), rangestouse = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, > 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1,2", > "1,2,3", "2,3,4", "3,4"), class = "factor")), .Names = c("rows", > "cols", "plotid", "yield", "linecode", "ranges", "rangestouse" > > ), class = "data.frame", row.names = c(NA, -64L)) > > Function: > > fun1<- function (dataset, plot.id, ranges2u
[R] subset data using a vector
Dear R users, I like to split my data by a vector created by using variable "ranges". This vector will have the current range (ranges), preceding range (ranges - 1), and post range (ranges + 1) for a given plotid. If the preceding or post ranges in this vector are outside the levels of ranges in the data set then I like to drop those ranges and only include the ranges that are available. Variable "rangestouse" includes all the desired ranges I like to subset a given plotid. After I subset these dataset using these desired ranges, then I like to extract the yield data for checks in those desired ranges and adjust yield of my data by dividing yield of a given plotid with the check average for the desired ranges. I have created this function (fun1) but when I run it, I get the following error: Error in m1[[i]] : subscript out of bounds Any help will be highly appreciated! Thanks, Nilesh Dataset: dput(mydata) structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), cols = structure(c(1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16"), class = "factor"), plotid = c(289L, 298L, 299L, 300L, 301L, 302L, 303L, 304L, 290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 384L, 375L, 374L, 373L, 372L, 371L, 370L, 369L, 383L, 382L, 381L, 380L, 379L, 378L, 377L, 376L, 385L, 394L, 395L, 396L, 397L, 398L, 399L, 400L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L, 480L, 471L, 470L, 469L, 468L, 467L, 466L, 465L, 479L, 478L, 477L, 476L, 475L, 474L, 473L, 472L), yield = c(5.1, 5, 3.9, 4.6, 5, 4.4, 5.1, 4.3, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5, 5.6, 4.7, 5.4, 4.8, 4.6, 3.9, 4.2, 4.4, 5.3, 5.5, 5.8, 4.6, 5.8, 4.8, 5.3, 5.5, 5.6, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5, 5, 4.8, 4.9, 5.2, 5.3, 4.6, 4.8, 5.3, 4.5, 4.5, 5.1, 4.9, 5.2, 4.6, 4.8, 5.4, 5.9, 4.9, 5.8, 5.3, 4.8, 4.7, 5.2, 5.8 ), linecode = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("check", "variety"), class = "factor"), ranges = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L ), rangestouse = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1,2", "1,2,3", "2,3,4", "3,4"), class = "factor")), .Names = c("rows", "cols", "plotid", "yield", "linecode", "ranges", "rangestouse" ), class = "data.frame", row.names = c(NA, -64L)) Function: fun1<- function (dataset, plot.id, ranges2use, control) { m1 <- strsplit(as.character(dataset$ranges2use), ",") dat1 <- data.frame() m2 <- c() row_check_mean <- c() row_check_adj_yield <- c() x <- length(plot.id) for (i in (1:x)) { m2[i] <- m1[[i]] dat1 <- dataset[dataset$ranges %in% m2[i], ] row_check_mean[i] <- tapply(dat1$trait, dat1$control, mean, na.rm = TRUE)[1] row_check_adj_yield[i] <- ifelse(control[i] == "variety", trait[i]/dataset$row_check_mean[i], trait[i]/trait[i]) } data.frame(dataset, row_check_adj_yield) } Apply function: fun1(mydata, plot.id=mydata$plotid, ranges2use = mydata$rangestouse,control=mydata$linecode) Error: Error in m1[[i]] : subscript out of bounds Session info: R version 3.2.1 (2015-06-18) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] magrittr_1.5plyr_1.8.3 tools_3.2.1 reshape2_1.4.1 Rcpp_0.12.1
[R] spatial adjustment using checks
Hi, I have yield data for several varieties and a randomly placed check (1 in every 8 column or "cols") in a field test arranged in a rows*cols grid format (see image attached). Both "rows" & "cols" are variables in the data set. I like to adjust "yield" variable for each row listed as "variety" in variable "linecode" by dividing its yield with the average yield of four nearest "check" (on the rows*cols field grid) in variable "linecode". I like to have two checks on the same row where one check is on the left and the other is on the right side of a given variety. The other two checks should come from the two neighboring columns ("cols"). If a check is missing on one or more sides of a given variety, then I like to proceed with the calculation with only the available checks around that given variety. If two checks on the neighboring column are equidistance from a given variety then use position of the variety to choose which one to use (If variety is in cols 1-8 then use check from those cols; if variety is in cols 9-16 then use check from cols 9-16). Below is the function I wrote which adjust yield values for each "variety" (variable "linecode") by dividing its yield with the average yield of all checks in the field. Instead of using average check across the whole field, I like to use the four neighboring checks to make this adjustment. I am struggling with specifying the four nearest checks in this loop. I played around using "dist" function but without any success. I tried searching for any packages that can do these nearest check adjustments without any success. Any help will be appreciated. ---function-- function (dataset, trait, control) { m <- c() x <- length(trait) chkmean <- tapply(trait, control, mean, na.rm = T) for (i in 1:x) { m[i] <- ifelse(control[i] == "variety", trait[i]/chkmean[1], trait[i]/trait[i]) } head(as.data.frame(m)) } -data-- dput(dat) structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), cols = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 16L, 15L, 14L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 16L, 15L, 14L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16"), class = "factor"), plotid = c(289L, 290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 298L, 299L, 300L, 301L, 302L, 303L, 304L, 369L, 370L, 371L, 372L, 373L, 374L, 375L, 376L, 377L, 378L, 379L, 380L, 381L, 382L, 383L, 384L, 385L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L, 394L, 395L, 396L, 397L, 398L, 399L, 400L, 465L, 466L, 467L, 468L, 469L, 470L, 471L, 472L, 473L, 474L, 475L, 476L, 477L, 478L, 479L, 480L), yield = c(5.1, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5, 5, 3.9, 4.6, 5, 4.4, 5.1, 4.3, 4.4, 4.2, 3.9, 4.6, 4.8, 5.4, 4.7, 5.5, 5.3, 4.8, 5.8, 4.6, 5.8, 5.5, 5.3, 5.6, 5.6, 5, 4.8, 4.9, 5.2, 5.3, 4.6, 4.8, 5.3, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5, 5.4, 4.8, 4.6, 5.2, 4.9, 5.1, 4.5, 5.8, 5.2, 4.7, 4.8, 5.3, 5.8, 4.9, 5.9, 4.5), line = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 21L, 22L, 1L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 1L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 1L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 1L, 51L, 52L, 53L, 54L, 1L, 55L, 56L, 57L), .Label = c("CHK", "V002", "V003", "V004", "V005", "V006", "V007", "V008", "V009", "V010", "V011", "V012", "V013", "V014", "V015", "V016", "V017", "V018", "V019", "V020", "V021", "V022", "V023", "V024", "V025", "V026", "V027", "V028", "V029", "V030", "V031", "V032", "V033", "V034", "V035", "V036", "V037", "V038", "V039", "V040", "V041", "V042", "V043", "V044", "V045", "V046", "V047", "V048", "V049", "V050", "V051", "V052", "V053", "V054", "V055", "V056", "V057" ), class = "factor"), linecode = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("check", "variety"), class = "factor")), .Names = c("rows", "cols", "plotid", "yield", "line", "linecode"), row.names = c(NA, -64L)
Re: [R] help with "by" function
Jean: Thanks a lot!! The changes you made to the code gave me what I needed. I truly appreciate your time in correcting the code. Nilesh From: Adams, Jean [mailto:jvad...@usgs.gov] Sent: Tuesday, May 12, 2015 2:14 PM To: DIGHE, NILESH [AG/2362] Cc: r-help@r-project.org Subject: Re: [R] help with "by" function Nilesh, I found a couple errors in your code. First, in your by() statement you have a function to operate on the selected subset of data, which you refer to as x but then, in your aov statement you refer to data_set not x Second, your funC() statement is a function of trait_names but to make sure that the name of the variable is included in the formula, I changed this to a character variable. Give the code below a try and see if it works for you. Jean library(agricolae) library(dplyr) funC <- function(trait_names){ by(data_set, data_set$Isopair, function(x) { mod <- aov(formula(paste(trait_names, "~ STGgroup*Field + Rep%in%Field")), data=x) out <- HSD.test(mod, "STGgroup", group=TRUE, console=TRUE) dfout <- arrange(data.frame(out$groups), desc(trt)) }) } funC("Trait1") On Tue, May 12, 2015 at 1:03 PM, DIGHE, NILESH [AG/2362] mailto:nilesh.di...@monsanto.com>> wrote: Hi, I have an anonymous function called function(x) that will run anova, run HSD.test on the model, and then sort the results. I am passing this anonymous function to the "by" function to get results by "Isopair" factor which is my index variable. Since I want to run the anova on multiple dependent variables including "Trait1" & "Trait2", I am calling the dependent variable, "trait_names" in the model and then passing the "by" function to another function called "funC" which takes the "trait_names" as an argument to execute. After I execute the funC(data_set$trait1), I am getting the results by Isopair but the results for the two Isopairs (Isopair-A &Isopair-B) are the same which I know is not correct. It looks like the data is not getting split by Isopairs and so ALL data is used in anova for both Isopairs. Any help in modifying function, funC or any other ways to achieve the desired outcome will be highly appreciated. Thanks. Nilesh R code, data set, and session info is pasted below. R code: library(agricolae) funC<- function(trait_names){ by(data_set, data_set$Isopair,function(x){ mod<- aov(trait_names~ STGgroup*Field + Rep%in%Field, data=data_set) out<-HSD.test(mod,"STGgroup",group=TRUE,console=TRUE) dfout<- arrange(data.frame(out$groups),desc(trt)) }) } Results: ##execute funC function for Trait1 & Trait2 funC(data_set$Trait1) data_set$Isopair: Isopair-A trtmeans M 1 STG 776.9167 a 2 Non-STG 779.0833 a --- data_set$Isopair: Isopair-B trtmeans M 1 STG 776.9167 a 2 Non-STG 779.0833 a Data: data_set<- structure(list(Field = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("LML6", "TZL2"), class = "factor"), Isopair = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Isopair-A", "Isopair-B" ), class = "factor"), STGgroup = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Non-STG", "STG"), class = "factor"), Rep = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("Rep1", "Rep2", "Rep3"), class = "factor"), Trait1 = c(686L, 641L, 642L, 727L, 619L, 562L, 808L, 739L, 744L, 873L, 797L, 868L, 782L, 783L, 675L, 713L, 762L, 641L, 1009L, 995L, 845L, 1186L, 912L, 663L), Trait2 = c(45L, 65L, 70L, 35L, 20L, 80L, 70L, 65L, 70L, 20L, 30L, 35L, 40L, 55L, 35L, 40L, 35L, 40L, 40L, 35L, 25L, 40L, 35L, 25L)), .Names = c("Field", "Isopair", "STGgroup", "Rep", "Trait1", "Trait2"), class = "data.frame", row.names = c(NA, -24L)) Session info: R version 3.1.3 (2015-03-09) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] gridExtra_0.9.1 Hmisc_3.16-0Formula_1.2-1 survival_2.38-1 caret_6.0-41ggplot2_1.0.1
[R] help with "by" function
Hi, I have an anonymous function called function(x) that will run anova, run HSD.test on the model, and then sort the results. I am passing this anonymous function to the "by" function to get results by "Isopair" factor which is my index variable. Since I want to run the anova on multiple dependent variables including "Trait1" & "Trait2", I am calling the dependent variable, "trait_names" in the model and then passing the "by" function to another function called "funC" which takes the "trait_names" as an argument to execute. After I execute the funC(data_set$trait1), I am getting the results by Isopair but the results for the two Isopairs (Isopair-A &Isopair-B) are the same which I know is not correct. It looks like the data is not getting split by Isopairs and so ALL data is used in anova for both Isopairs. Any help in modifying function, funC or any other ways to achieve the desired outcome will be highly appreciated. Thanks. Nilesh R code, data set, and session info is pasted below. R code: library(agricolae) funC<- function(trait_names){ by(data_set, data_set$Isopair,function(x){ mod<- aov(trait_names~ STGgroup*Field + Rep%in%Field, data=data_set) out<-HSD.test(mod,"STGgroup",group=TRUE,console=TRUE) dfout<- arrange(data.frame(out$groups),desc(trt)) }) } Results: ##execute funC function for Trait1 & Trait2 funC(data_set$Trait1) data_set$Isopair: Isopair-A trtmeans M 1 STG 776.9167 a 2 Non-STG 779.0833 a --- data_set$Isopair: Isopair-B trtmeans M 1 STG 776.9167 a 2 Non-STG 779.0833 a Data: data_set<- structure(list(Field = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("LML6", "TZL2"), class = "factor"), Isopair = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Isopair-A", "Isopair-B" ), class = "factor"), STGgroup = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Non-STG", "STG"), class = "factor"), Rep = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("Rep1", "Rep2", "Rep3"), class = "factor"), Trait1 = c(686L, 641L, 642L, 727L, 619L, 562L, 808L, 739L, 744L, 873L, 797L, 868L, 782L, 783L, 675L, 713L, 762L, 641L, 1009L, 995L, 845L, 1186L, 912L, 663L), Trait2 = c(45L, 65L, 70L, 35L, 20L, 80L, 70L, 65L, 70L, 20L, 30L, 35L, 40L, 55L, 35L, 40L, 35L, 40L, 40L, 35L, 25L, 40L, 35L, 25L)), .Names = c("Field", "Isopair", "STGgroup", "Rep", "Trait1", "Trait2"), class = "data.frame", row.names = c(NA, -24L)) Session info: R version 3.1.3 (2015-03-09) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] gridExtra_0.9.1 Hmisc_3.16-0Formula_1.2-1 survival_2.38-1 caret_6.0-41ggplot2_1.0.1 [7] lattice_0.20-30 MASS_7.3-39 dplyr_0.4.1 agricolae_1.2-1 loaded via a namespace (and not attached): [1] acepack_1.3-3.3 assertthat_0.1 boot_1.3-15 BradleyTerry2_1.0-6 [5] brglm_0.5-9 car_2.0-25 cluster_2.0.1 coda_0.17-1 [9] codetools_0.2-10colorspace_1.2-6combinat_0.0-8 DBI_0.3.1 [13] deldir_0.1-9digest_0.6.8foreach_1.4.2 foreign_0.8-63 [17] gtable_0.1.2gtools_3.4.1iterators_1.0.7 klaR_0.6-12 [21] latticeExtra_0.6-26 lazyeval_0.1.10 LearnBayes_2.15 lme4_1.1-7 [25] magrittr_1.5Matrix_1.1-5mgcv_1.8-4 minqa_1.2.4 [29] munsell_0.4.2 nlme_3.1-120nloptr_1.0.4nnet_7.3-9 [33] parallel_3.1.3 pbkrtest_0.4-2 plyr_1.8.1 proto_0.3-10 [37] quantreg_5.11 RColorBrewer_1.1-2 Rcpp_0.11.6 reshape2_1.4.1 [41] rpart_4.1-9 scales_0.2.4sp_1.0-17 SparseM_1.6 [45] spdep_0.5-88splines_3.1.3 stringr_0.6.2 tools_3.1.3 This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is