Re: [R] help with recursive function

2017-12-14 Thread DIGHE, NILESH [AG/2362]
When I run the code without stopifnot, the code takes 5 min to run and then it 
throws an error listed below without producing any results.
Error: node stack overflow
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Error during wrapup: node stack overflow

Thanks.
Nilesh

From: William Dunlap [mailto:wdun...@tibco.com]
Sent: Thursday, December 14, 2017 11:26 AM
To: DIGHE, NILESH [AG/2362] 
Cc: Eric Berger ; r-help 
Subject: Re: [R] help with recursive function

Your code contains the lines
stopifnot(!(any(data1$norm_sd >= 1)))
if(!(any(data1$norm_sd >= 1))) {
df1 <- dat1
return(df1)
}

stop() "throws an error", causing the current function and all functions in the 
call
stack to abort and return nothing.  It does not mean to stop now and return a 
result.
Does the function give the correct results if you just leave out the stopifnot 
line?


Bill Dunlap
TIBCO Software
wdunlap tibco.com<http://tibco.com>

On Thu, Dec 14, 2017 at 9:11 AM, DIGHE, NILESH [AG/2362] 
mailto:nilesh.di...@monsanto.com>> wrote:
Eric:  I will try and see if I can figure out the issue by debugging as you 
suggested. I don’t know why my code after stopifnot is not getting executed 
where I like the code to run the funlp2 function when the if statement is TRUE 
but when it is false, I like it to keep running until the stopifnot condition 
is met.

When the stopifnot condition is met, I like to get the output from if statement 
saved.
Anyway,  I will keep trying.
Again, Thanks for your help!
Nilesh

From: Eric Berger [mailto:ericjber...@gmail.com<mailto:ericjber...@gmail.com>]
Sent: Thursday, December 14, 2017 10:29 AM
To: DIGHE, NILESH [AG/2362] 
mailto:nilesh.di...@monsanto.com>>
Cc: r-help mailto:r-help@r-project.org>>
Subject: Re: [R] help with recursive function

If you are trying to understand why the "stopifnot" condition is met you can 
replace it by something like:

if ( any(dat2$norm_sd >= 1) )
   browser()

This will put you in a debugging session where you can examine your variables, 
e.g.

> dat$norm_sd

HTH,
Eric



On Thu, Dec 14, 2017 at 5:33 PM, Eric Berger 
mailto:ericjber...@gmail.com><mailto:ericjber...@gmail.com<mailto:ericjber...@gmail.com>>>
 wrote:
The message is coming from your stopifnot() condition being met.


On Thu, Dec 14, 2017 at 5:31 PM, DIGHE, NILESH [AG/2362] 
mailto:nilesh.di...@monsanto.com><mailto:nilesh.di...@monsanto.com<mailto:nilesh.di...@monsanto.com>>>
 wrote:
Hi, I accidently left out few lines of code from the calclp function.  Updated 
function is pasted below.
I am still getting the same error “Error: !(any(data1$norm_sd >= 1)) is not 
TRUE“

I would appreciate any help.
Nilesh
dput(calclp)
function (dataset)
{
dat1 <- funlp1(dataset)
recursive_funlp <- function(dataset = dat1, func = funlp2) {
dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>%
mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>%
spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_",
names(.)), funs(norm = round(scale(.), 3)))
dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))],
1, sd, na.rm = TRUE), 3)
dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))],
1, function(x) {
max(abs(x), na.rm = TRUE)
}), 3)
data1 <- dat2 %>% gather(key, value, -uniqueid, -norm_max,
-norm_sd) %>% separate(key, c("field_rep", "treatment"),
"\\.") %>% spread(treatment, 
value) %>% mutate(outlier = NA)
stopifnot(!(any(data1$norm_sd >= 1)))
if (!(any(data1$norm_sd >= 1))) {
        df1 <- dat1
return(df1)
}
   else {
df2 <- recursive_funlp()
return(df2)
}
}
df3 <- recursive_funlp(dataset = dat1, func = funlp2)
df3
}


From: DIGHE, NILESH [AG/2362]
Sent: Thursday, December 14, 2017 9:01 AM
To: 'Eric Berger' 
mailto:ericjber...@gmail.com><mailto:ericjber...@gmail.com<mailto:ericjber...@gmail.com>>>
Cc: r-help 
mailto:r-help@r-project.org><mailto:r-help@r-project.org<mailto:r-help@r-project.org>>>
Subject: RE: [R] help with recursive function

Eric:  Thanks for taking time to look into my problem.  Despite of making the 
change you suggested, I am still getting the same error.  I am wondering if the 
logic I am using in the stopifnot and if functions is a problem.
I like the recursive function to stop whenever the norm_sd column has zero 
values that are above or equal to 1. Below is the calclp function after the 
changes you suggested.
Thanks. Nilesh

dput(calclp)
function (dataset)
{
 

Re: [R] help with recursive function

2017-12-14 Thread DIGHE, NILESH [AG/2362]
Eric:  I will try and see if I can figure out the issue by debugging as you 
suggested. I don’t know why my code after stopifnot is not getting executed 
where I like the code to run the funlp2 function when the if statement is TRUE 
but when it is false, I like it to keep running until the stopifnot condition 
is met.

When the stopifnot condition is met, I like to get the output from if statement 
saved.
Anyway,  I will keep trying.
Again, Thanks for your help!
Nilesh

From: Eric Berger [mailto:ericjber...@gmail.com]
Sent: Thursday, December 14, 2017 10:29 AM
To: DIGHE, NILESH [AG/2362] 
Cc: r-help 
Subject: Re: [R] help with recursive function

If you are trying to understand why the "stopifnot" condition is met you can 
replace it by something like:

if ( any(dat2$norm_sd >= 1) )
   browser()

This will put you in a debugging session where you can examine your variables, 
e.g.

> dat$norm_sd

HTH,
Eric



On Thu, Dec 14, 2017 at 5:33 PM, Eric Berger 
mailto:ericjber...@gmail.com>> wrote:
The message is coming from your stopifnot() condition being met.


On Thu, Dec 14, 2017 at 5:31 PM, DIGHE, NILESH [AG/2362] 
mailto:nilesh.di...@monsanto.com>> wrote:
Hi, I accidently left out few lines of code from the calclp function.  Updated 
function is pasted below.
I am still getting the same error “Error: !(any(data1$norm_sd >= 1)) is not 
TRUE“

I would appreciate any help.
Nilesh
dput(calclp)
function (dataset)
{
dat1 <- funlp1(dataset)
recursive_funlp <- function(dataset = dat1, func = funlp2) {
dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>%
mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>%
spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_",
names(.)), funs(norm = round(scale(.), 3)))
dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))],
1, sd, na.rm = TRUE), 3)
dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))],
1, function(x) {
max(abs(x), na.rm = TRUE)
}), 3)
data1 <- dat2 %>% gather(key, value, -uniqueid, -norm_max,
-norm_sd) %>% separate(key, c("field_rep", "treatment"),
"\\.") %>% spread(treatment, value) %>% mutate(outlier = 
NA)
stopifnot(!(any(data1$norm_sd >= 1)))
if (!(any(data1$norm_sd >= 1))) {
df1 <- dat1
return(df1)
}
   else {
        df2 <- recursive_funlp()
return(df2)
}
}
df3 <- recursive_funlp(dataset = dat1, func = funlp2)
df3
}


From: DIGHE, NILESH [AG/2362]
Sent: Thursday, December 14, 2017 9:01 AM
To: 'Eric Berger' mailto:ericjber...@gmail.com>>
Cc: r-help mailto:r-help@r-project.org>>
Subject: RE: [R] help with recursive function

Eric:  Thanks for taking time to look into my problem.  Despite of making the 
change you suggested, I am still getting the same error.  I am wondering if the 
logic I am using in the stopifnot and if functions is a problem.
I like the recursive function to stop whenever the norm_sd column has zero 
values that are above or equal to 1. Below is the calclp function after the 
changes you suggested.
Thanks. Nilesh

dput(calclp)
function (dataset)
{
dat1 <- funlp1(dataset)
recursive_funlp <- function(dataset = dat1, func = funlp2) {
dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>%
mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>%
spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_",
names(.)), funs(norm = round(scale(.), 3)))
dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))],
1, sd, na.rm = TRUE), 3)
dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))],
1, function(x) {
max(abs(x), na.rm = TRUE)
}), 3)
stopifnot(!(any(dat2$norm_sd >= 1)))
if (!(any(dat2$norm_sd >= 1))) {
df1 <- dat1
return(df1)
    }
    else {
df2 <- recursive_funlp()
return(df2)
}
}
df3 <- recursive_funlp(dataset = dat1, func = funlp2)
df3
}


From: Eric Berger [mailto:ericjber...@gmail.com]
Sent: Thursday, December 14, 2017 8:17 AM
To: DIGHE, NILESH [AG/2362] 
mailto:nilesh.di...@monsanto.com>>
Cc: r-help mailto:r-help@r-project.org>>
Subject: Re: [R] help with recursive function

My own typo ... whoops ...

!( any(dat2$norm_sd >= 1 ))



On Thu, Dec 14, 2017 at 3:43 PM, Eric Berger 
mailto:ericjber...@gmail.com>> wrote:
You seem to have a typo at this expression (and some others like it)

Namely, you write

any(!dat2$norm_sd) 

Re: [R] help with recursive function

2017-12-14 Thread DIGHE, NILESH [AG/2362]
Hi, I accidently left out few lines of code from the calclp function.  Updated 
function is pasted below.
I am still getting the same error “Error: !(any(data1$norm_sd >= 1)) is not 
TRUE“

I would appreciate any help.
Nilesh
dput(calclp)
function (dataset)
{
dat1 <- funlp1(dataset)
recursive_funlp <- function(dataset = dat1, func = funlp2) {
dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>%
mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>%
spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_",
names(.)), funs(norm = round(scale(.), 3)))
dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))],
1, sd, na.rm = TRUE), 3)
dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))],
1, function(x) {
max(abs(x), na.rm = TRUE)
}), 3)
data1 <- dat2 %>% gather(key, value, -uniqueid, -norm_max,
-norm_sd) %>% separate(key, c("field_rep", "treatment"),
"\\.") %>% spread(treatment, value) %>% mutate(outlier = NA)
stopifnot(!(any(data1$norm_sd >= 1)))
if (!(any(data1$norm_sd >= 1))) {
df1 <- dat1
return(df1)
}
   else {
df2 <- recursive_funlp()
    return(df2)
}
}
df3 <- recursive_funlp(dataset = dat1, func = funlp2)
df3
}


From: DIGHE, NILESH [AG/2362]
Sent: Thursday, December 14, 2017 9:01 AM
To: 'Eric Berger' 
Cc: r-help 
Subject: RE: [R] help with recursive function

Eric:  Thanks for taking time to look into my problem.  Despite of making the 
change you suggested, I am still getting the same error.  I am wondering if the 
logic I am using in the stopifnot and if functions is a problem.
I like the recursive function to stop whenever the norm_sd column has zero 
values that are above or equal to 1. Below is the calclp function after the 
changes you suggested.
Thanks. Nilesh

dput(calclp)
function (dataset)
{
dat1 <- funlp1(dataset)
recursive_funlp <- function(dataset = dat1, func = funlp2) {
dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>%
mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>%
spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_",
names(.)), funs(norm = round(scale(.), 3)))
dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))],
1, sd, na.rm = TRUE), 3)
dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))],
1, function(x) {
max(abs(x), na.rm = TRUE)
}), 3)
stopifnot(!(any(dat2$norm_sd >= 1)))
if (!(any(dat2$norm_sd >= 1))) {
df1 <- dat1
return(df1)
}
else {
    df2 <- recursive_funlp()
return(df2)
}
}
df3 <- recursive_funlp(dataset = dat1, func = funlp2)
df3
}


From: Eric Berger [mailto:ericjber...@gmail.com]
Sent: Thursday, December 14, 2017 8:17 AM
To: DIGHE, NILESH [AG/2362] 
mailto:nilesh.di...@monsanto.com>>
Cc: r-help mailto:r-help@r-project.org>>
Subject: Re: [R] help with recursive function

My own typo ... whoops ...

!( any(dat2$norm_sd >= 1 ))



On Thu, Dec 14, 2017 at 3:43 PM, Eric Berger 
mailto:ericjber...@gmail.com>> wrote:
You seem to have a typo at this expression (and some others like it)

Namely, you write

any(!dat2$norm_sd) >= 1

when you possibly meant to write

!( any(dat2$norm_sd) >= 1 )

i.e. I think your ! seems to be in the wrong place.

HTH,
Eric


On Thu, Dec 14, 2017 at 3:26 PM, DIGHE, NILESH [AG/2362] 
mailto:nilesh.di...@monsanto.com>> wrote:
Hi, I need some help with running a recursive function. I like to run funlp2 
recursively.
When I try to run recursive function in another function named "calclp" I get 
this "Error: any(!dat2$norm_sd) >= 1 is not TRUE".

I have never built a recursive function before so having trouble executing it 
in this case.  I would appreciate any help or guidance to resolve this issue. 
Please see my data and the three functions that I am using below.
Please note that calclp is the function I am running and the other two 
functions are within this calclp function.

# code:
Test<- calclp(dataset = dat)

# calclp function

calclp<- function (dataset)

{

dat1 <- funlp1(dataset)

recursive_funlp <- function(dataset = dat1, func = funlp2) {

dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>%

mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>%

spread(key = field_rep, value = lp) %>% mutate_at(.vars = gr

Re: [R] help with recursive function

2017-12-14 Thread DIGHE, NILESH [AG/2362]
Eric:  Thanks for taking time to look into my problem.  Despite of making the 
change you suggested, I am still getting the same error.  I am wondering if the 
logic I am using in the stopifnot and if functions is a problem.
I like the recursive function to stop whenever the norm_sd column has zero 
values that are above or equal to 1. Below is the calclp function after the 
changes you suggested.
Thanks. Nilesh

dput(calclp)
function (dataset)
{
dat1 <- funlp1(dataset)
recursive_funlp <- function(dataset = dat1, func = funlp2) {
dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>%
mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>%
spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_",
names(.)), funs(norm = round(scale(.), 3)))
dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))],
1, sd, na.rm = TRUE), 3)
dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))],
1, function(x) {
max(abs(x), na.rm = TRUE)
}), 3)
stopifnot(!(any(dat2$norm_sd >= 1)))
if (!(any(dat2$norm_sd >= 1))) {
df1 <- dat1
return(df1)
}
else {
df2 <- recursive_funlp()
return(df2)
}
}
df3 <- recursive_funlp(dataset = dat1, func = funlp2)
df3
}


From: Eric Berger [mailto:ericjber...@gmail.com]
Sent: Thursday, December 14, 2017 8:17 AM
To: DIGHE, NILESH [AG/2362] 
Cc: r-help 
Subject: Re: [R] help with recursive function

My own typo ... whoops ...

!( any(dat2$norm_sd >= 1 ))



On Thu, Dec 14, 2017 at 3:43 PM, Eric Berger 
mailto:ericjber...@gmail.com>> wrote:
You seem to have a typo at this expression (and some others like it)

Namely, you write

any(!dat2$norm_sd) >= 1

when you possibly meant to write

!( any(dat2$norm_sd) >= 1 )

i.e. I think your ! seems to be in the wrong place.

HTH,
Eric


On Thu, Dec 14, 2017 at 3:26 PM, DIGHE, NILESH [AG/2362] 
mailto:nilesh.di...@monsanto.com>> wrote:
Hi, I need some help with running a recursive function. I like to run funlp2 
recursively.
When I try to run recursive function in another function named "calclp" I get 
this "Error: any(!dat2$norm_sd) >= 1 is not TRUE".

I have never built a recursive function before so having trouble executing it 
in this case.  I would appreciate any help or guidance to resolve this issue. 
Please see my data and the three functions that I am using below.
Please note that calclp is the function I am running and the other two 
functions are within this calclp function.

# code:
Test<- calclp(dataset = dat)

# calclp function

calclp<- function (dataset)

{

dat1 <- funlp1(dataset)

recursive_funlp <- function(dataset = dat1, func = funlp2) {

dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>%

mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>%

spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_",

names(.)), funs(norm = round(scale(.), 3)))

dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))],

1, sd, na.rm = TRUE), 3)

dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))],

1, function(x) {

max(abs(x), na.rm = TRUE)

}), 3)

stopifnot(any(!dat2$norm_sd) >= 1)

if (any(!dat2$norm_sd) >= 1) {

df1 <- dat1

return(df1)

}

else {

df2 <- recursive_funlp()

return(df2)

}

}

df3 <- recursive_funlp(dataset = dat1, func = funlp2)

df3

}


# funlp1 function

funlp1<- function (dataset)

{

dat2 <- dataset %>% select(field, set, ent_num, rep_num,

lp) %>% unite(uniqueid, set, ent_num, sep = ".") %>%

unite(field_rep, field, rep_num) %>% mutate(field_rep = paste(field_rep,

"lp", sep = ".")) %>% spread(key = field_rep, value = lp) %>%

mutate_at(.vars = grep("_", names(.)), funs(norm = round(scale(.),

3)))

dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))],

1, sd, na.rm = TRUE), 3)

dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))],

1, function(x) {

max(abs(x), na.rm = TRUE)

}), 3)

data1 <- dat2 %>% gather(key, value, -uniqueid, -norm_max,

-norm_sd) %>% separate(key, c("field_rep", "treatment"),

"\\.") %>% spread(treatment, value) %>% mutate(outlier = NA)

df_clean <- with(data1, data1[norm_sd < 1, ])

datD <

[R] help with recursive function

2017-12-14 Thread DIGHE, NILESH [AG/2362]
Hi, I need some help with running a recursive function. I like to run funlp2 
recursively.
When I try to run recursive function in another function named "calclp" I get 
this "Error: any(!dat2$norm_sd) >= 1 is not TRUE".

I have never built a recursive function before so having trouble executing it 
in this case.  I would appreciate any help or guidance to resolve this issue. 
Please see my data and the three functions that I am using below.
Please note that calclp is the function I am running and the other two 
functions are within this calclp function.

# code:
Test<- calclp(dataset = dat)

# calclp function

calclp<- function (dataset)

{

dat1 <- funlp1(dataset)

recursive_funlp <- function(dataset = dat1, func = funlp2) {

dat2 <- dataset %>% select(uniqueid, field_rep, lp) %>%

mutate(field_rep = paste(field_rep, "lp", sep = ".")) %>%

spread(key = field_rep, value = lp) %>% mutate_at(.vars = grep("_",

names(.)), funs(norm = round(scale(.), 3)))

dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))],

1, sd, na.rm = TRUE), 3)

dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))],

1, function(x) {

max(abs(x), na.rm = TRUE)

}), 3)

stopifnot(any(!dat2$norm_sd) >= 1)

if (any(!dat2$norm_sd) >= 1) {

df1 <- dat1

return(df1)

}

else {

df2 <- recursive_funlp()

return(df2)

}

}

df3 <- recursive_funlp(dataset = dat1, func = funlp2)

df3

}


# funlp1 function

funlp1<- function (dataset)

{

dat2 <- dataset %>% select(field, set, ent_num, rep_num,

lp) %>% unite(uniqueid, set, ent_num, sep = ".") %>%

unite(field_rep, field, rep_num) %>% mutate(field_rep = paste(field_rep,

"lp", sep = ".")) %>% spread(key = field_rep, value = lp) %>%

mutate_at(.vars = grep("_", names(.)), funs(norm = round(scale(.),

3)))

dat2$norm_sd <- round(apply(dat2[, grep("lp_norm", names(dat2))],

1, sd, na.rm = TRUE), 3)

dat2$norm_max <- round(apply(dat2[, grep("lp_norm", names(dat2))],

1, function(x) {

max(abs(x), na.rm = TRUE)

}), 3)

data1 <- dat2 %>% gather(key, value, -uniqueid, -norm_max,

-norm_sd) %>% separate(key, c("field_rep", "treatment"),

"\\.") %>% spread(treatment, value) %>% mutate(outlier = NA)

df_clean <- with(data1, data1[norm_sd < 1, ])

datD <- with(data1, data1[norm_sd >= 1, ])

s <- split(datD, datD$uniqueid)

sdf <- lapply(s, function(x) {

data.frame(x, x$outlier <- ifelse(is.na(x$lp_norm), NA,

ifelse(abs(x$lp_norm) == x$norm_max, "yes", "no")),

x$lp <- with(x, ifelse(outlier == "yes", NA, lp)))

x

})

sdf2 <- bind_rows(sdf)

all_dat <- bind_rows(df_clean, sdf2)

all_dat

}


# funlp2 function

funlp2<-function (dataset)

{

data1 <- dataset

df_clean <- with(data1, data1[norm_sd < 1, ])

datD <- with(data1, data1[norm_sd >= 1, ])

s <- split(datD, datD$uniqueid)

sdf <- lapply(s, function(x) {

data.frame(x, x$outlier <- ifelse(is.na(x$lp_norm), NA,

ifelse(abs(x$lp_norm) == x$norm_max, "yes", "no")),

x$lp <- with(x, ifelse(outlier == "yes", NA, lp)))

x

})

sdf2 <- bind_rows(sdf)

all_dat <- bind_rows(df_clean, sdf2)

all_dat

}


# dataset
dput(dat)
structure(list(field = c("LM01", "LM01", "LM01", "LM01", "LM01",
"LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01",
"LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01",
"LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01",
"LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01",
"LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01",
"LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01",
"LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01",
"LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "LM01", "OL01",
"OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01",
"OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01",
"OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01",
"OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01",
"OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01",
"OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01",
"OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01",
"OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01",
"OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "OL01", "SGI1",
"SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1",
"SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1",
"SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1",
"SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1",
"SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "SGI1", "S

Re: [R] dynamically create columns using a function

2017-07-21 Thread DIGHE, NILESH [AG/2362]
Hi Elie,

Thanks for your time and efforts. I plugged in the calculation I wanted to do 
in the code you provided and got I wanted perfectly. Below is the solution to 
my original problem.

# dataset

dem<- structure(list(id = c("L1", "L2", "L3", "M1", "M2", "M3"), TEST_SET_NAME 
= c("A",
"A", "A", "B", "B", "B"), YLD_BE_REG1 = c(1467L, 1455L, 1382L,
1463L, 1466L, 1455L), YLD_BE_REG2 = c(1501L, 1441L, 1421L, 1482L,
1457L, 1490L), IS_GG = c("NO", "NO", "YES", "NO", "NO", "YES"
)), .Names = c("id", "TEST_SET_NAME", "YLD_BE_REG1", "YLD_BE_REG2",
"IS_GG"), class = "data.frame", row.names = c(NA, -6L))

# function calc_gg

calc_gg<- function (dataset, col, col_name)

{

mutate_call = lazyeval::interp(~round(((a - mean(a[IS_GG ==

"YES"], na.rm = TRUE))/mean(a[IS_GG == "YES"], na.rm = TRUE)) *

100, 1), a = as.name(col))

dataset %>% group_by(TEST_SET_NAME) %>% mutate_(.dots = 
setNames(list(mutate_call),

col_name)) %>% ungroup()

}


# function f

f<- function (dat, blup_datacut)

{

col_name_gg <- paste("GG", blup_datacut, sep = "_")

col_mean_gg <- paste("YLD_BE", blup_datacut, sep = "_")

dat2 <- calc_gg(dataset = dat, col = col_mean_gg, col_name = col_name_gg)

dat2

}

# function demo_fn

demo_fn<- function (dat, f, blup_datacut)

{

for (i in blup_datacut) {

dat <- f(dat, i)

}

dat

}

# get expected results by applying functions
demo_fn(dem, f, c("REG1", "REG2"))


Best Regards,
Nilesh
From: Elie Canonici Merle [mailto:elie.canonicime...@gmail.com]
Sent: Friday, July 21, 2017 3:44 AM
To: DIGHE, NILESH [AG/2362] 
Cc: r-help@r-project.org
Subject: Re: [R] dynamically create columns using a function

Hi,
I don't know about the lazyeval package or what you are trying to do but to 
answer the main question "How to create columns dynamically using a function?" 
I would do something like that:

# dataset
dem <- structure(list(id = c("L1", "L2", "L3", "M1", "M2", "M3"), TEST_SET_NAME 
= c("A",
"A", "A", "B", "B", "B"), YLD_BE_REG1 = c(1467L, 1455L, 1382L,
1463L, 1466L, 1455L), YLD_BE_REG2 = c(1501L, 1441L, 1421L, 1482L,
1457L, 1490L), IS_GG = c("NO", "NO", "YES", "NO", "NO", "YES"
)), .Names = c("id", "TEST_SET_NAME", "YLD_BE_REG1", "YLD_BE_REG2",
"IS_GG"), class = "data.frame", row.names = c(NA, -6L))



demo_fn<- function (data, f, names) {
for (i in names) {
data <- f(data, i)
}
data
}

f <- function(data, name) {
col_work <- paste("YLD_BE", name, sep = "_")
col_name_result <- paste("GG", name, sep = "_")
#do something interesting, here I am simply copying the column
data[col_name_result] <- data[col_work]
data
}

demo_fn(dem, f, c("REG1", "REG2"))




If you are working with large datasets it might not be the best solution as my 
understanding is that this method involves a lot of copying.

Hope it helps,

Elie Canonici Merle

2017-07-20 17:55 GMT+02:00 DIGHE, NILESH [AG/2362] 
mailto:nilesh.di...@monsanto.com>>:
Hi,
I am writing a function to dynamically create column names and fill those 
columns with some basic calculations.  My function "demo_fn" takes argument 
"blup_datacut" and I like to use the contents of those arguments to dynamically 
create new columns in my dataset. Please note that I have another function 
called "calc_gg" within the function "demo_fn". Both functions are pasted below.
I have a for loop within my function and it appears to only create new column 
for the last value in the argument "blup_datacut" which makes me think that I 
am not storing the values coming out of for_loop correctly. I have 
"expected_results", dataset, & functions pasted below to reproduce my problem 
and expected results.
Any help will be greatly appreciate.


# dataset
dem<- structure(list(id = c("L1", "L2", "L3", "M1", "M2", "M3"), TEST_SET_NAME 
= c("A",
"A", "A", "B", "B", "B"), YLD_BE_REG1 = c(1467L, 1455L, 1382L,
1463L, 1466L, 1455L), YLD_BE_REG2 = c(1501L, 1441L, 1421L, 1482L,
1457L, 1490L), IS_GG = c("NO", "NO", "YES", "NO", "NO", "YES"
)), .Names = c("id&quo

[R] dynamically create columns using a function

2017-07-20 Thread DIGHE, NILESH [AG/2362]
Hi,
I am writing a function to dynamically create column names and fill those 
columns with some basic calculations.  My function "demo_fn" takes argument 
"blup_datacut" and I like to use the contents of those arguments to dynamically 
create new columns in my dataset. Please note that I have another function 
called "calc_gg" within the function "demo_fn". Both functions are pasted below.
I have a for loop within my function and it appears to only create new column 
for the last value in the argument "blup_datacut" which makes me think that I 
am not storing the values coming out of for_loop correctly. I have 
"expected_results", dataset, & functions pasted below to reproduce my problem 
and expected results.
Any help will be greatly appreciate.


# dataset
dem<- structure(list(id = c("L1", "L2", "L3", "M1", "M2", "M3"), TEST_SET_NAME 
= c("A",
"A", "A", "B", "B", "B"), YLD_BE_REG1 = c(1467L, 1455L, 1382L,
1463L, 1466L, 1455L), YLD_BE_REG2 = c(1501L, 1441L, 1421L, 1482L,
1457L, 1490L), IS_GG = c("NO", "NO", "YES", "NO", "NO", "YES"
)), .Names = c("id", "TEST_SET_NAME", "YLD_BE_REG1", "YLD_BE_REG2",
"IS_GG"), class = "data.frame", row.names = c(NA, -6L))

# function demo_fn

demo_fn<- function (dat, blup_datacut = c("REG1", "REG2"))

{

for (i in seq_along(blup_datacut)) {

col_name_gg <- paste("GG", blup_datacut[i], sep = "_")

col_mean_gg <- paste("YLD_BE", blup_datacut[i], sep = "_")

dat2 <- calc_gg(dataset = dat, col = col_mean_gg, col_name = 
col_name_gg)

}

dat2

}


# function calc_gg

Calc_gg<- function (dataset, col, col_name)

{

mutate_call = lazyeval::interp(~round(((a - mean(a[IS_GG ==

"YES"], na.rm = TRUE))/mean(a[IS_GG == "YES"], na.rm = TRUE)) *

100, 1), a = as.name(col))

dataset %>% group_by(TEST_SET_NAME) %>% mutate_(.dots = 
setNames(list(mutate_call),

col_name)) %>% ungroup()

}


# run function
results_demo<- demo_fn(dat =  dem)

# expected results

structure(list(id = c("L1", "L2", "L3", "M1", "M2", "M3"), TEST_SET_NAME = 
c("A",

"A", "A", "B", "B", "B"), YLD_BE_REG1 = c(1467L, 1455L, 1382L,

1463L, 1466L, 1455L), YLD_BE_REG2 = c(1501L, 1441L, 1421L, 1482L,

1457L, 1490L), IS_GG = c("NO", "NO", "YES", "NO", "NO", "YES"

), GG_REG1 = c(6.2, 5.3, 0, 0.5, 0.8, 0), GG_REG2 = c(5.6, 1.4,

0, -0.5, -2.2, 0)), .Names = c("id", "TEST_SET_NAME", "YLD_BE_REG1",

"YLD_BE_REG2", "IS_GG", "GG_REG1", "GG_REG2"), row.names = c(NA,

-6L), class = "data.frame")

Thanks.
Nilesh
This email and any attachments were sent from a Monsanto email account and may 
contain confidential and/or privileged information. If you are not the intended 
recipient, please contact the sender and delete this email and any attachments 
immediately. Any unauthorized use, including disclosing, printing, storing, 
copying or distributing this email, is prohibited. All emails and attachments 
sent to or from Monsanto email accounts may be subject to monitoring, reading, 
and archiving by Monsanto, including its affiliates and subsidiaries, as 
permitted by applicable law. Thank you.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why is merge sorting even when sort = F?

2017-03-09 Thread DIGHE, NILESH [AG/2362]
Using the "join" function from the plyr package preserves the data order 
library(plyr)
join(grades2, info, by="grade", type="left", match="all")

Nilesh
-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Dimitri 
Liakhovitski
Sent: Wednesday, March 08, 2017 12:45 PM
To: Jeff Newmiller 
Cc: r-help 
Subject: Re: [R] Why is merge sorting even when sort = F?

I understood your answer.
The point is that sort = TRUE that doesn't sort is plain confusing.
Instead, the option should have been something like efficient = TRUE or FALSE. 
At least then no one would stupidly expect sort = TRUE to sort and sort = FALSE 
to NOT sort.

On Wed, Mar 8, 2017 at 12:51 PM, Jeff Newmiller  
wrote:
> If you are still wondering, try re-reading my answer. FALSE is more 
> efficient, TRUE is sorted. Lack of sorting has nothing to do with preserving 
> order.
> --
> Sent from my phone. Please excuse my brevity.
>
> On March 8, 2017 8:55:06 AM PST, Dimitri Liakhovitski 
>  wrote:
>>Thank you. I was just curious what sort=FALSE had no impact.
>>Wondering what it is there for then...
>>
>>On Wed, Mar 8, 2017 at 11:43 AM, Jeff Newmiller 
>> wrote:
>>> Merging is not necessarily an order-preserving operation, but 
>>> sorting
>>can make the operation more efficient. The sort=TRUE argument forces 
>>the result to be sorted, but sort=FALSE is in not a promise that order 
>>will be preserved. (I think the imperfect sorting occurs when there 
>>are multiple keys but am not sure.) You can add columns to the input 
>>data that let you restore some semblance of the original ordering 
>>afterward, or you can roll your own possibly-less-efficient merge 
>>using match and
>>indexing:
>>>
>>> info[ match( grades2$grade, info$grade ), ]
>>> --
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> On March 8, 2017 8:07:27 AM PST, Dimitri Liakhovitski
>> wrote:
Hello!
I have a vector 'grades' and a data frame 'info':

grades2 <- data.frame(grade = c(1,2,2,3,1)) info <- data.frame(
  grade = 3:1,
  desc = c("Excellent", "Good", "Poor"),
  fail = c(F, F, T)
)

I want to get the info for all grades I have in info:

This solution resorts everything in the order of column 'grade':
merge(grades2, info, by = "grade", all.x = T, all.y = F)

Could you please explain why this solution also resorts - despite
>>sort
= FALSE?
merge(grades2, info, by = "grade", all.x = T, all.y = F, sort =
>>FALSE)

Thanks a lot!



--
Dimitri Liakhovitski

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
This email and any attachments were sent from a Monsanto email account and may 
contain confidential and/or privileged information. If you are not the intended 
recipient, please contact the sender and delete this email and any attachments 
immediately. Any unauthorized use, including disclosing, printing, storing, 
copying or distributing this email, is prohibited. All emails and attachments 
sent to or from Monsanto email accounts may be subject to monitoring, reading, 
and archiving by Monsanto, including its affiliates and subsidiaries, as 
permitted by applicable law. Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Understanding and predict round-off errors sign on simple functions

2016-06-30 Thread DIGHE, NILESH [AG/2362]
Using "runmean" function from caTools package within your SMA function appears 
to solve the issue.  Please see details below.

library(caTools)

> dput(m)
structure(c(-0.626453810742332, 0.183643324222082, -0.835628612410047, 
1.59528080213779, 0.329507771815361, -0.820468384118015, 0.487429052428485, 
0.738324705129217, 0.575781351653492, -0.305388387156356, 3.51178116845085, 
2.38984323641143, 1.3787594194582, -0.2146998871775, 3.12493091814311, 
1.95506639098477, 1.98380973690105, 2.9438362106853, 2.82122119509809, 
2.59390132121751, 5.91897737160822, 5.78213630073107, 5.07456498336519, 
3.01064830413663, 5.61982574789471, 4.943871260471, 4.84420449329467, 
3.52924761610073, 4.52184994489138, 5.4179415601997), .Dim = c(10L, 
3L))


> dput(SMA)
function (x, n = 10, ...) 
{
ma <- runmean(x, n)
if (!is.null(dim(ma))) {
colnames(ma) <- "SMA"
}
return(ma)
}


mma <- apply(m, 2, SMA, n=1)

results<-mma-m

> dput(results)
structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(10L, 3L))


Nilesh
-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Marc Schwartz
Sent: Wednesday, June 29, 2016 1:07 PM
To: Bert Gunter
Cc: R-help
Subject: Re: [R] Understanding and predict round-off errors sign on simple 
functions

Hi,

Just to augment Bert's comments, I presume that you are aware of the relevant R 
FAQ:

  
https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

That you had an expectation of the difference being 0 suggested to me that you 
might not be, but my apologies if that is not the case.

That being said, there are some higher precision CRAN packages that may offer 
some additional functionality, with the potential limitations that Bert 
references below. More information is available in the Numerical Mathematics 
CRAN Task View:

  https://cran.r-project.org/web/views/NumericalMathematics.html

In addition, with the caveat that I have not used it, there is the 'propagate' 
package on CRAN that may be relevant to what you want to be able to anticipate, 
at some level:

  https://cran.r-project.org/web/packages/propagate/index.html

It has not been updated in a while and there are some notes for the CRAN 
package checks, that suggest that the maintainer may not be active at this 
point.

Regards,

Marc


> On Jun 29, 2016, at 10:13 AM, Bert Gunter  wrote:
> 
> I am certainly no expert, but I would assume that:
> 
> 1. Roundoff errors depend on the exact numerical libraries and 
> versions that are used, and so general language comparisons are 
> impossible without that information;
> 
> 2. Roundoff errors depend on the exact calculations being done and 
> machine precision and are very complicated to determine
> 
> So I would say the answer to your questions is no.
> 
> But you should probably address such a question to a numerical analyst 
> for an authoritative answer. Maybe try stats.stackexchange.com  .
> 
> -- Bert
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along 
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Wed, Jun 29, 2016 at 2:55 AM, Sirhc via R-help  
> wrote:
>> Hi,
>> 
>> 
>> 
>> May be it is a basic thing but I would like to know if we can 
>> anticipate round-off errors sign.
>> 
>> 
>> 
>> Here is an example :
>> 
>> 
>> 
>> # numerical matrix
>> 
>> m <- matrix(data=cbind(rnorm(10, 0), rnorm(10, 2), rnorm(10, 5)), 
>> nrow=10,
>> ncol=3)
>> 
>> 
>> 
>>> m
>> 
>>[,1]  [,2] [,3]
>> 
>> [1,]  0.4816247 1.1973502 3.855641
>> 
>> [2,] -1.2174937 0.7356427 4.393279
>> 
>> [3,]  0.8504074 2.5286509 2.689196
>> 
>> [4,]  1.8048642 1.8580804 6.665237
>> 
>> [5,] -0.6749397 1.0944277 4.838608
>> 
>> [6,]  0.8252034 1.5595268 3.681695
>> 
>> [7,]  1.3002208 0.9582693 4.561577
>> 
>> [8,]  1.6950923 3.5677921 6.005078
>> 
>> [9,]  0.6509285 0.9025964 5.082288
>> 
>> [10,] -0.5676040 1.3281102 4.446451
>> 
>> 
>> 
>> #weird moving average of period 1 !
>> 
>> mma <- apply(m, 2, SMA, n=1)
>> 
>> 
>> 
>>> mma
>> 
>>[,1]  [,2] [,3]
>> 
>> [1,] NANA   NA
>> 
>> [2,] -1.2174937 0.7356427 4.393279
>> 
>> [3,]  0.8504074 2.5286509 2.689196
>> 
>> [4,]  1.8048642 1.8580804 6.665237
>> 
>> [5,] -0.6749397 1.0944277 4.838608
>> 
>> [6,]  0.8252034 1.5595268 3.681695
>> 
>> [7,]  1.3002208 0.9582693 4.561577
>> 
>> [8,]  1.6950923 3.5677921 6.005078
>> 
>> [9,]  0.6509285 0.9025964 5.082288
>> 
>> [10,] -0.5676040 1.3281102 4.446451
>> 
>> 
>> 
>> 
>> 
>> #difference should be 0 but here is the result
>> 
>>> m - mma
>> 
>>   [,1] [,2]  [,3]
>> 
>> [1,]NA   NANA
>> 
>> [2,]  0.00e+00 0.00e+00 -8.881784e-16
>> 
>> [3,]  0.00e+00 0.00e+00 -8.881784e-16
>> 
>> [4,]  0.00e+00 4.440892e-16 -8.881784e-16
>> 
>> [

Re: [R] sample within a loop

2016-03-20 Thread DIGHE, NILESH [AG/2362]
Tanvir & Don:  Thanks a lot for your solutions.  Both solutions work great.  I 
really appreciate your help.
Regards,
Nilesh

-Original Message-
From: Mohammad Tanvir Ahamed [mailto:mashra...@yahoo.com] 
Sent: Thursday, March 17, 2016 1:24 PM
To: DIGHE, NILESH [AG/2362]; r-help@r-project.org
Subject: Re: [R] sample within a loop

Hi, 

you can try

df1<-split(df,df$groups) 

lapply(df1, function(x) 
{ 
 x<-cbind(x,entry=0) 
 sam <- sample(x$plotno,1) 
 x$entry[which(x$plotno==sam)]<-"CONTROL" 
 x$entry[which(!x$plotno==sam)]<-"TEST" 
 x 
} 
)

 
Tanvir Ahamed 
Göteborg, Sweden  |  mashra...@yahoo.com 



____________
From: "DIGHE, NILESH [AG/2362]" 
To: "r-help@r-project.org"  
Sent: Thursday, 17 March 2016, 18:18
Subject: [R] sample within a loop


Dear R users,
My data frame has four "groups" namely A1, B2, C3, & D4.  Each 
group has 12 rows (variable "plotno).  I like to randomly sample one "plotno" 
within each "groups" variable and label it as "CONTROL" and label others as 
"TEST" in a new variable called "entry".  I am trying to do this by looping 
over the group variable and then sample "plotno" within a given group.  I am 
ending up with four "CONTROL" plots but they are generated by sampling over all 
the groups instead of each group.  I need one random "plotno" assigned as a 
"CONTROL" per group (A1, B2, C3, D4).  I would appreciate any help in modifying 
my function "funa" or suggest any alternative and better way to do this task.  
Below is the dataset and function I am working with.

# dataset (df)
structure(list(plotno = 1:48, groups = c("A1", "A1", "A1", "A1",
"A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "B2", "B2", "B2",
"B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "C3", "C3",
"C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "D4",
"D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4"
)), .Names = c("plotno", "groups"), row.names = c(NA, -48L), class = 
"data.frame")

# function (funa)

function (dataset)

{

set.seed(1)

bay <- unique(dataset$groups)

IND <- c()

df2 <- dataset

for (i in bay) {

IND[i] <- which(plotno %in% sample(plotno, 1))

df2$entry <- ifelse(df2$plotno %in% IND, "CONTROL", "TEST")

}

df2

}


# session info

R version 3.2.1 (2015-06-18)

Platform: i386-w64-mingw32/i386 (32-bit)

Running under: Windows 7 x64 (build 7601) Service Pack 1



locale:

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
  LC_MONETARY=English_United States.1252

[4] LC_NUMERIC=C   LC_TIME=English_United States.1252



attached base packages:

[1] stats graphics  grDevices utils datasets  methods   base



loaded via a namespace (and not attached):

[1] tools_3.2.1

Thanks.
Nilesh


Nilesh Dighe
(806)-252-7492 (Cell)
(806)-741-2019 (Office)


This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking 
for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage 
caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control 
laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and 
sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this 
information you are obligated to comply with all
applicable U.S. export laws and regulations.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -

[R] sample within a loop

2016-03-19 Thread DIGHE, NILESH [AG/2362]
Dear R users,
My data frame has four "groups" namely A1, B2, C3, & D4.  Each 
group has 12 rows (variable "plotno).  I like to randomly sample one "plotno" 
within each "groups" variable and label it as "CONTROL" and label others as 
"TEST" in a new variable called "entry".  I am trying to do this by looping 
over the group variable and then sample "plotno" within a given group.  I am 
ending up with four "CONTROL" plots but they are generated by sampling over all 
the groups instead of each group.  I need one random "plotno" assigned as a 
"CONTROL" per group (A1, B2, C3, D4).  I would appreciate any help in modifying 
my function "funa" or suggest any alternative and better way to do this task.  
Below is the dataset and function I am working with.

# dataset (df)
structure(list(plotno = 1:48, groups = c("A1", "A1", "A1", "A1",
"A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "B2", "B2", "B2",
"B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "C3", "C3",
"C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "C3", "D4",
"D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4", "D4"
)), .Names = c("plotno", "groups"), row.names = c(NA, -48L), class = 
"data.frame")

# function (funa)

function (dataset)

{

set.seed(1)

bay <- unique(dataset$groups)

IND <- c()

df2 <- dataset

for (i in bay) {

IND[i] <- which(plotno %in% sample(plotno, 1))

df2$entry <- ifelse(df2$plotno %in% IND, "CONTROL", "TEST")

}

df2

}


# session info

R version 3.2.1 (2015-06-18)

Platform: i386-w64-mingw32/i386 (32-bit)

Running under: Windows 7 x64 (build 7601) Service Pack 1



locale:

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
  LC_MONETARY=English_United States.1252

[4] LC_NUMERIC=C   LC_TIME=English_United States.1252



attached base packages:

[1] stats graphics  grDevices utils datasets  methods   base



loaded via a namespace (and not attached):

[1] tools_3.2.1

Thanks.
Nilesh


Nilesh Dighe
(806)-252-7492 (Cell)
(806)-741-2019 (Office)


This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking 
for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage 
caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control 
laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and 
sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this 
information you are obligated to comply with all
applicable U.S. export laws and regulations.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] store results from loop into a dataframe

2016-01-05 Thread DIGHE, NILESH [AG/2362]
Sarah:  Thanks a lot for taking time to guide me to the right direction.  I now 
see how the missing data is causing the problem.
Thanks again!
Nilesh

-Original Message-
From: Sarah Goslee [mailto:sarah.gos...@gmail.com] 
Sent: Tuesday, January 05, 2016 12:13 PM
To: DIGHE, NILESH [AG/2362]
Cc: r-help@r-project.org
Subject: Re: [R] store results from loop into a dataframe

If you run each variable individually, you'll discover that the NAs in your 
data are causing problems. It's up to you to figure out what the best way to 
handle those missing values for your research is.

Sarah

On Tue, Jan 5, 2016 at 12:39 PM, DIGHE, NILESH [AG/2362] 
 wrote:
> Sarah: Thanks for pointing out the errors in my function.
>
> Below are the errors I am getting after I run the corrected quote:
> Error in if (s) { : missing value where TRUE/FALSE needed In addition: 
> Warning message:
> In qtukey(1 - alpha, ntr, DFerror) : NaNs produced
>
> You are right, I have no idea to handle these errors.
>
> Do you recommend any other approach to solve my problem?
>
> Thanks for your time.
> Nilesh
>
>
>
> -Original Message-
> From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
> Sent: Tuesday, January 05, 2016 11:20 AM
> To: DIGHE, NILESH [AG/2362]
> Cc: r-help@r-project.org
> Subject: Re: [R] store results from loop into a dataframe
>
> Leaving aside the question of whether this is the best way to approach your 
> problem (unlikely), there's a couple of errors in your code involving 
> indexing. Once fixed, the code demonstrates some errors in your use of 
> HSD.test that will be harder for you to deal with.
>
> Thanks for the complete reproducible example.
>
> fun2 <- function (x)
>
> {
>
> trait_names <- c("yield", "lp", "lnth")
>
> d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0,
>
> 6))
>
> for (i in trait_names) {
> # your formula has all the trait names, not the selected one
> # mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + 
> PEDIGREE*FIELD + FIELD%in%REP")), data = x)
> mod <- aov(formula(paste(i, "~ PEDIGREE + FIELD + 
> PEDIGREE*FIELD + FIELD%in%REP")), data = x)
>
> out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = 
> FALSE)
>
> # you're indexing by the trait name, instead of its position
> # d[, i] <- out$means[, 1]
> d[, which(trait_names == i)] <- out$means[, 1]
>
> }
>
> d
>
> }
>
> Sarah
>
> On Tue, Jan 5, 2016 at 11:48 AM, DIGHE, NILESH [AG/2362] 
>  wrote:
>> Dear R users:
>>
>> I am trying to create a function that will loop over three dependent 
>> variables in my aov model, and then get the HSD.test for each variable.  I 
>> like to store the results from each loop in a data frame.
>>
>>
>>
>> When I run my function (funx) on my data (dat), results from only yield gets 
>> populated in all three columns of the dataframe.  I am not able to store the 
>> results for each variable in a dataframe. Any help will be highly 
>> appreciated.
>>
>>
>>
>>
>>
>>
>>
>> function (x)
>>
>> {
>>
>> trait_names <- c("yield", "lp", "lnth")
>>
>> d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0,
>>
>> 6))
>>
>> for (i in trait_names) {
>>
>> mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + 
>> PEDIGREE*FIELD + FIELD%in%REP")),
>>
>> data = x)
>>
>> out <- HSD.test(mod, "PEDIGREE", group = TRUE, console =
>> FALSE)
>>
>> d[, i] <- out$means[, 1]
>>
>> }
>>
>> d
>>
>> }
>>
>>
>> structure(list(FIELD = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 
>> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 
>> 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 
>> 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = 
>> c("FYLS", "HKI1", "KIS1", "LMLS", "SELS", "SGL1"), class = "factor"), 
>> REP = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
>> 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,

Re: [R] store results from loop into a dataframe

2016-01-05 Thread DIGHE, NILESH [AG/2362]
Sarah: Thanks for pointing out the errors in my function.

Below are the errors I am getting after I run the corrected quote:
Error in if (s) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In qtukey(1 - alpha, ntr, DFerror) : NaNs produced

You are right, I have no idea to handle these errors.

Do you recommend any other approach to solve my problem? 

Thanks for your time.
Nilesh 



-Original Message-
From: Sarah Goslee [mailto:sarah.gos...@gmail.com] 
Sent: Tuesday, January 05, 2016 11:20 AM
To: DIGHE, NILESH [AG/2362]
Cc: r-help@r-project.org
Subject: Re: [R] store results from loop into a dataframe

Leaving aside the question of whether this is the best way to approach your 
problem (unlikely), there's a couple of errors in your code involving indexing. 
Once fixed, the code demonstrates some errors in your use of HSD.test that will 
be harder for you to deal with.

Thanks for the complete reproducible example.

fun2 <- function (x)

{

trait_names <- c("yield", "lp", "lnth")

d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0,

6))

for (i in trait_names) {
# your formula has all the trait names, not the selected one
# mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + 
PEDIGREE*FIELD + FIELD%in%REP")), data = x)
mod <- aov(formula(paste(i, "~ PEDIGREE + FIELD + PEDIGREE*FIELD + 
FIELD%in%REP")), data = x)

out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = FALSE)

# you're indexing by the trait name, instead of its position
# d[, i] <- out$means[, 1]
d[, which(trait_names == i)] <- out$means[, 1]

}

d

}

Sarah

On Tue, Jan 5, 2016 at 11:48 AM, DIGHE, NILESH [AG/2362] 
 wrote:
> Dear R users:
>
> I am trying to create a function that will loop over three dependent 
> variables in my aov model, and then get the HSD.test for each variable.  I 
> like to store the results from each loop in a data frame.
>
>
>
> When I run my function (funx) on my data (dat), results from only yield gets 
> populated in all three columns of the dataframe.  I am not able to store the 
> results for each variable in a dataframe. Any help will be highly appreciated.
>
>
>
>
>
>
>
> function (x)
>
> {
>
> trait_names <- c("yield", "lp", "lnth")
>
> d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0,
>
> 6))
>
> for (i in trait_names) {
>
> mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + 
> PEDIGREE*FIELD + FIELD%in%REP")),
>
> data = x)
>
> out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = 
> FALSE)
>
> d[, i] <- out$means[, 1]
>
> }
>
> d
>
> }
>
>
> structure(list(FIELD = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 
> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
> 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 
> 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = 
> c("FYLS", "HKI1", "KIS1", "LMLS", "SELS", "SGL1"), class = "factor"), 
> REP = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
> 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
> 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
> 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
> 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
> 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
> 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1", "2", "3"), 
> class = "factor"), PEDIGREE = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
> 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 
> 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 
> 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 
> 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 
> 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 
> 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L), 
> .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), yield = 
> c(1003L, 923L, 1268L, 1226L, 1059L, 1150L, 900L, 816L, 1072L, 1158L, 
> 1026L, 1299L, 1083L, 1038L, 1236L, 12

[R] store results from loop into a dataframe

2016-01-05 Thread DIGHE, NILESH [AG/2362]
Dear R users:

I am trying to create a function that will loop over three dependent variables 
in my aov model, and then get the HSD.test for each variable.  I like to store 
the results from each loop in a data frame.



When I run my function (funx) on my data (dat), results from only yield gets 
populated in all three columns of the dataframe.  I am not able to store the 
results for each variable in a dataframe. Any help will be highly appreciated.







function (x)

{

trait_names <- c("yield", "lp", "lnth")

d = data.frame(yield = rep(0, 6), lp = rep(0, 6), lnth = rep(0,

6))

for (i in trait_names) {

mod <- aov(formula(paste(trait_names, "~ PEDIGREE + FIELD + 
PEDIGREE*FIELD + FIELD%in%REP")),

data = x)

out <- HSD.test(mod, "PEDIGREE", group = TRUE, console = FALSE)

d[, i] <- out$means[, 1]

}

d

}


structure(list(FIELD = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L), .Label = c("FYLS", "HKI1", "KIS1", "LMLS",
"SELS", "SGL1"), class = "factor"), REP = structure(c(1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1", "2",
"3"), class = "factor"), PEDIGREE = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L,
1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L,
6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L,
5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L,
5L, 5L, 5L, 6L, 6L, 6L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L), .Label = c("A", "B", "C", "D",
"E", "F"), class = "factor"), yield = c(1003L, 923L, 1268L, 1226L,
1059L, 1150L, 900L, 816L, 1072L, 1158L, 1026L, 1299L, 1083L,
1038L, 1236L, 1287L, 1270L, 1612L, 1513L, 1676L, 1504L, 1417L,
1932L, 1644L, 1293L, 1542L, 1452L, 1180L, 1248L, 1764L, 1326L,
1877L, 1788L, 1606L, 1809L, 1791L, 2294L, 2315L, 2320L, 2083L,
1895L, 2284L, 2000L, 2380L, 1952L, 2414L, 2354L, 2095L, 2227L,
2093L, 2019L, 2505L, 2410L, 2287L, 2507L, 2507L, 2349L, 2162L,
2108L, 2319L, 2028L, 1947L, 2352L, 2698L, 2369L, 1798L, 2422L,
2509L, 2234L, 2451L, 2139L, 1957L, 799L, 787L, 701L, 781L, 808L,
582L, 770L, 752L, 801L, 865L, 608L, 620L, 677L, 775L, 722L, 1030L,
606L, 729L, 1638L, 1408L, 1045L, 1685L, 1109L, 1210L, 1419L,
1048L, 1129L, 1549L, 1325L, 1315L, 1838L, 1066L, 1295L, 1499L,
1472L, 1139L), lp = c(NA, NA, 46.31, NA, NA, 43.8, NA, NA, 43.91,
NA, NA, 44.47, NA, NA, 45.16, NA, NA, 43.57, 40.65, NA, NA, 40.04,
NA, NA, 41.33, NA, NA, 40.75, NA, NA, 42.04, NA, NA, 40.35, NA,
NA, 43.682, NA, NA, 41.712, NA, NA, 42.566, NA, NA, 43.228, NA,
NA, 43.63, NA, NA, 42.058, NA, NA, NA, 45.19, NA, NA, 41.91,
NA, NA, 43.86, NA, NA, 44.48, NA, NA, 44.34, NA, NA, 43.03, NA,
NA, NA, 44.08, NA, NA, 41.39, NA, NA, 42.48, NA, NA, 44.13, NA,
NA, 43.39, NA, NA, 42.82, 42.18, NA, NA, 41.42, NA, NA, 41.25,
NA, NA, 42.31, NA, NA, 43.22, NA, NA, 40.52, NA, NA), lnth = c(NA,
NA, 1.151, NA, NA, 1.135, NA, NA, 1.109, NA, NA, 1.117, NA, NA,
1.107, NA, NA, 1.196, 1.255, NA, NA, 1.229, NA, NA, 1.158, NA,
NA, 1.214, NA, NA, 1.152, NA, NA, 1.194, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
1.2, NA, NA, 1.219, NA, NA, 1.115, NA, NA, 1.205, NA, NA, 1.238,
NA, NA, 1.244, NA, NA, NA, 1.096, NA, NA, 1.021, NA, NA, 1.055,
NA, NA, 1.058, NA, NA, 1.026, NA, NA, 1.115, 1.202, NA, NA, 1.161,
NA, NA, 1.168, NA, NA, 1.189, NA, NA, 1.204, NA, NA, 1.277, NA,
NA)), .Names = c("FIELD", "REP", "PEDIGREE", "yield", "lp", "lnth"
), row.names = c(NA, -108L), class = "data.frame")






R version 3.2.1 (2015-06-18)

Platform: i386-w64-mingw32/i386 (32-bit)

Running under: Windows 7 x64 (build 7601) Service Pack 1



locale:

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252

[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252



attached base packages:

[1] stats graphics  grDevices utils datasets  methods   base



other attached packages:

[1] agricolae_1.2-1 asreml_3.0  lattice_0.20-31 ggplot2_1.0.1   dplyr_0.4.2 
plyr_1.8.3



loaded via a namespace (and not attached):

 [1] spdep_0.5

Re: [R] subset data using a vector

2015-11-24 Thread DIGHE, NILESH [AG/2362]
Jim & Michael:  I really appreciate your guidance in creating the function I 
wanted.  I took suggestions from both of you and was able to complete this 
function.  I had to split the process into two functions as listed below.
I just thought to send the results to the list in case someone might be 
interested in doing similar task in the future.
Thanks.
Nilesh

getcheckmeans<- function (dataset)
{
row_check_mean <- c()
dat1 <- data.frame()
check_mean <- c()
x <- length(dataset$plotid)
for (i in (1:x)) {
r1 <- dataset[i, 1]
r2 <- r1 - 1
r3 <- r1 + 1
r4 <- c(r1, r2, r3)
dat1 <- split(dataset, dataset$rows %in% r4)[[2]]
row_check_mean[i] <- tapply(dat1$yield, dat1$linecode,
mean, na.rm = TRUE)[1]
check_mean <- round(unlist(row_check_mean)[1:x], digits = 2)
}
check_mean
}


adjustdata<- function (dataset, trait, control)

{

check_mean <- getcheckmeans(dataset)

dat_check_mean <- as.data.frame(check_mean)

dataset <- cbind(dataset, dat_check_mean)

adj_yield <- c()

x <- length(trait)

for (i in 1:x) {

adj_yield[i] <- ifelse(control[i] == "variety", 
round(trait[i]/dataset$check_mean[i],

digits = 3), round(trait[i]/trait[i], digits = 3))

}

data.frame(dataset, adj_yield)

}


dat<- structure(list(rows = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,

3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,

4L, 4L, 4L, 4L, 4L, 4L), cols = c(1L, 2L, 3L, 4L, 5L, 6L, 7L,

8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 16L, 15L, 14L, 13L,

12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 1L, 2L, 3L,

4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 16L,

15L, 14L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L,

1L), plotid = c(289L, 290L, 291L, 292L, 293L, 294L, 295L, 296L,

297L, 298L, 299L, 300L, 301L, 302L, 303L, 304L, 369L, 370L, 371L,

372L, 373L, 374L, 375L, 376L, 377L, 378L, 379L, 380L, 381L, 382L,

383L, 384L, 385L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,

394L, 395L, 396L, 397L, 398L, 399L, 400L, 465L, 466L, 467L, 468L,

469L, 470L, 471L, 472L, 473L, 474L, 475L, 476L, 477L, 478L, 479L,

480L), yield = c(5.1, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5, 5,

3.9, 4.6, 5, 4.4, 5.1, 4.3, 4.4, 4.2, 3.9, 4.6, 4.8, 5.4, 4.7,

5.5, 5.3, 4.8, 5.8, 4.6, 5.8, 5.5, 5.3, 5.6, 5.6, 5, 4.8, 4.9,

5.2, 5.3, 4.6, 4.8, 5.3, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5, 5.4,

4.8, 4.6, 5.2, 4.9, 5.1, 4.5, 5.8, 5.2, 4.7, 4.8, 5.3, 5.8, 4.9,

5.9, 4.5), line = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,

9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L,

1L, 21L, 22L, 1L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L,

32L, 33L, 1L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 1L,

43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 1L, 51L, 52L, 53L, 54L,

1L, 55L, 56L, 57L), .Label = c("CHK", "V002", "V003", "V004",

"V005", "V006", "V007", "V008", "V009", "V010", "V011", "V012",

"V013", "V014", "V015", "V016", "V017", "V018", "V019", "V020",

"V021", "V022", "V023", "V024", "V025", "V026", "V027", "V028",

"V029", "V030", "V031", "V032", "V033", "V034", "V035", "V036",

"V037", "V038", "V039", "V040", "V041", "V042", "V043", "V044",

"V045", "V046", "V047", "V048", "V049", "V050", "V051", "V052",

"V053", "V054", "V055", "V056", "V057"), class = "factor"), linecode = 
structure(c(1L,

2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,

2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("check",

"variety"), class = "factor")), .Names = c("rows", "cols", "plotid",

"yield", "line", "linecode"), class = "data.frame", row.names = c(NA,

-64L))

From: Jim Lemon [mailto:drjimle...@gmail.com]
Sent: Tuesday, November 24, 2015 2:53 AM
To: DIGHE, NILESH [AG/2362]
Cc: r-help@r-project.org
Subject: Re: [R] subset data using a vector

Hi Nilesh,
I simplified your code a bit:

fun1<-function (dataset, plot.id<http://plot.id>, ranges2use, cont

Re: [R] subset data using a vector

2015-11-23 Thread DIGHE, NILESH [AG/2362]
Michael:  I tried using your suggestion of using length and still get the same 
error:
Error in m1[[i]] : subscript out of bounds

I also checked the length of m1 and x and they both are of same length (64).

After trying several things, I was able to extract the list but this was done 
outside the function I am trying to create.
Code that worked is listed below:

for(i in (1:length(mydata$plotid))){
v1<-as.numeric(strsplit(as.character(mydata$rangestouse), ",")[[i]])
print(head(v1))}

However, when I try to get this code in a function (fun3) listed below, I get 
the following error:
Error in strsplit(as.character(dataset$ranges2use), ",")[[i]] : 
  subscript out of bounds

fun3<- function (dataset, plot.id, ranges2use, control) 
{
m1 <- c()
x <- length(plot.id)
for (i in (1:x)) {
m1 <- as.numeric(strsplit(as.character(dataset$ranges2use), 
",")[[i]])
}
m2
}

I am not sure where I am making a mistake.
Thanks.
Nilesh
 
-Original Message-
From: Michael Dewey [mailto:li...@dewey.myzen.co.uk] 
Sent: Monday, November 23, 2015 12:11 PM
To: DIGHE, NILESH [AG/2362]; r-help@r-project.org
Subject: Re: [R] subset data using a vector

Try looking at your function and work through what happens if the length is 
what I suggested.

 >>   x <- length(plot.id)
 >>
 >>   for (i in (1:x)) {
 >>
 >>   m2[i] <- m1[[i]]

So unless m1 has length at least x you are doomed.

On 23/11/2015 16:26, DIGHE, NILESH [AG/2362] wrote:
> Michael:  I like to use the actual range id's listed in column "rangestouse" 
> to subset my data and not the length of that vector.
>
> Thanks.
> Nilesh
>
> -Original Message-
> From: Michael Dewey [mailto:li...@dewey.myzen.co.uk]
> Sent: Monday, November 23, 2015 10:17 AM
> To: DIGHE, NILESH [AG/2362]; r-help@r-project.org
> Subject: Re: [R] subset data using a vector
>
> length(strsplit(as.character(mydata$ranges2use), ","))
>
> was that what you expected? I think not.
>
> On 23/11/2015 16:05, DIGHE, NILESH [AG/2362] wrote:
>> Dear R users,
>>   I like to split my data by a vector created by using 
>> variable "ranges".  This vector will have the current range (ranges), 
>> preceding range (ranges - 1), and post range (ranges + 1) for a given 
>> plotid.  If the preceding or post ranges in this vector are outside the 
>> levels of ranges in the data set then I like to drop those ranges and only 
>> include the ranges that are available.  Variable "rangestouse" includes all 
>> the desired ranges I like to subset a given plotid.  After I subset these 
>> dataset using these desired ranges, then I like to extract the yield data 
>> for checks in those desired ranges and adjust yield of my data by dividing 
>> yield of a given plotid with the check average for the desired ranges.
>>
>> I have created this function (fun1) but when I run it, I get the following 
>> error:
>>
>> Error in m1[[i]] : subscript out of bounds
>>
>> Any help will be highly appreciated!
>> Thanks, Nilesh
>>
>> Dataset:
>> dput(mydata)
>> structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
>> 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
>> 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
>> 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), 
>> cols = structure(c(1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 
>> 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 
>> 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 
>> 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 
>> 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Label = c("1", "2", "3", "4", "5", 
>> "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16"), class = 
>> "factor"),
>>   plotid = c(289L, 298L, 299L, 300L, 301L, 302L, 303L, 304L,
>>   290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 384L, 375L,
>>   374L, 373L, 372L, 371L, 370L, 369L, 383L, 382L, 381L, 380L,
>>   379L, 378L, 377L, 376L, 385L, 394L, 395L, 396L, 397L, 398L,
>>   399L, 400L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,
>>   480L, 471L, 470L, 469L, 468L, 467L, 466L, 465L, 479L, 478L,
>>   477L, 476L, 475L, 474L, 473L, 472L), yield = c(5.1, 5, 3.

Re: [R] subset data using a vector

2015-11-23 Thread DIGHE, NILESH [AG/2362]
Michael:  I like to use the actual range id's listed in column "rangestouse" to 
subset my data and not the length of that vector.

Thanks.
Nilesh

-Original Message-
From: Michael Dewey [mailto:li...@dewey.myzen.co.uk] 
Sent: Monday, November 23, 2015 10:17 AM
To: DIGHE, NILESH [AG/2362]; r-help@r-project.org
Subject: Re: [R] subset data using a vector

length(strsplit(as.character(mydata$ranges2use), ","))

was that what you expected? I think not.

On 23/11/2015 16:05, DIGHE, NILESH [AG/2362] wrote:
> Dear R users,
>  I like to split my data by a vector created by using 
> variable "ranges".  This vector will have the current range (ranges), 
> preceding range (ranges - 1), and post range (ranges + 1) for a given plotid. 
>  If the preceding or post ranges in this vector are outside the levels of 
> ranges in the data set then I like to drop those ranges and only include the 
> ranges that are available.  Variable "rangestouse" includes all the desired 
> ranges I like to subset a given plotid.  After I subset these dataset using 
> these desired ranges, then I like to extract the yield data for checks in 
> those desired ranges and adjust yield of my data by dividing yield of a given 
> plotid with the check average for the desired ranges.
>
> I have created this function (fun1) but when I run it, I get the following 
> error:
>
> Error in m1[[i]] : subscript out of bounds
>
> Any help will be highly appreciated!
> Thanks, Nilesh
>
> Dataset:
> dput(mydata)
> structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
> 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
> 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
> 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), 
> cols = structure(c(1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 
> 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 
> 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 
> 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 
> 5L, 6L, 7L, 8L, 9L), .Label = c("1", "2", "3", "4", "5", "6", "7", 
> "8", "9", "10", "11", "12", "13", "14", "15", "16"), class = "factor"),
>  plotid = c(289L, 298L, 299L, 300L, 301L, 302L, 303L, 304L,
>  290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 384L, 375L,
>  374L, 373L, 372L, 371L, 370L, 369L, 383L, 382L, 381L, 380L,
>  379L, 378L, 377L, 376L, 385L, 394L, 395L, 396L, 397L, 398L,
>  399L, 400L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,
>  480L, 471L, 470L, 469L, 468L, 467L, 466L, 465L, 479L, 478L,
>  477L, 476L, 475L, 474L, 473L, 472L), yield = c(5.1, 5, 3.9,
>  4.6, 5, 4.4, 5.1, 4.3, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5,
>  5.6, 4.7, 5.4, 4.8, 4.6, 3.9, 4.2, 4.4, 5.3, 5.5, 5.8, 4.6,
>  5.8, 4.8, 5.3, 5.5, 5.6, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5,
>  5, 4.8, 4.9, 5.2, 5.3, 4.6, 4.8, 5.3, 4.5, 4.5, 5.1, 4.9,
>  5.2, 4.6, 4.8, 5.4, 5.9, 4.9, 5.8, 5.3, 4.8, 4.7, 5.2, 5.8
>  ), linecode = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
>  2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L,
>  2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>  1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>  2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("check",
>  "variety"), class = "factor"), ranges = c(1L, 1L, 1L, 1L,
>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
>  2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
>  3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
>  4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L
>  ), rangestouse = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
>  2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
>  3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
>  4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1,2",
>  "1,2,3", "2,3,4", "3,4"), class = "factor")), .Names = c("rows", 
> "cols", "plotid", "yield", "linecode", "ranges", "rangestouse"
>
> ), class = "data.frame", row.names = c(NA, -64L))
>
> Function:
>
> fun1<- function (dataset, plot.id, ranges2u

[R] subset data using a vector

2015-11-23 Thread DIGHE, NILESH [AG/2362]
Dear R users,
I like to split my data by a vector created by using variable 
"ranges".  This vector will have the current range (ranges), preceding range 
(ranges - 1), and post range (ranges + 1) for a given plotid.  If the preceding 
or post ranges in this vector are outside the levels of ranges in the data set 
then I like to drop those ranges and only include the ranges that are 
available.  Variable "rangestouse" includes all the desired ranges I like to 
subset a given plotid.  After I subset these dataset using these desired 
ranges, then I like to extract the yield data for checks in those desired 
ranges and adjust yield of my data by dividing yield of a given plotid with the 
check average for the desired ranges.

I have created this function (fun1) but when I run it, I get the following 
error:

Error in m1[[i]] : subscript out of bounds

Any help will be highly appreciated!
Thanks, Nilesh

Dataset:
dput(mydata)
structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3",
"4"), class = "factor"), cols = structure(c(1L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L), .Label = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15", "16"), class = "factor"),
plotid = c(289L, 298L, 299L, 300L, 301L, 302L, 303L, 304L,
290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 384L, 375L,
374L, 373L, 372L, 371L, 370L, 369L, 383L, 382L, 381L, 380L,
379L, 378L, 377L, 376L, 385L, 394L, 395L, 396L, 397L, 398L,
399L, 400L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,
480L, 471L, 470L, 469L, 468L, 467L, 466L, 465L, 479L, 478L,
477L, 476L, 475L, 474L, 473L, 472L), yield = c(5.1, 5, 3.9,
4.6, 5, 4.4, 5.1, 4.3, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5,
5.6, 4.7, 5.4, 4.8, 4.6, 3.9, 4.2, 4.4, 5.3, 5.5, 5.8, 4.6,
5.8, 4.8, 5.3, 5.5, 5.6, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5,
5, 4.8, 4.9, 5.2, 5.3, 4.6, 4.8, 5.3, 4.5, 4.5, 5.1, 4.9,
5.2, 4.6, 4.8, 5.4, 5.9, 4.9, 5.8, 5.3, 4.8, 4.7, 5.2, 5.8
), linecode = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("check",
"variety"), class = "factor"), ranges = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L
), rangestouse = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1,2",
"1,2,3", "2,3,4", "3,4"), class = "factor")), .Names = c("rows",
"cols", "plotid", "yield", "linecode", "ranges", "rangestouse"

), class = "data.frame", row.names = c(NA, -64L))

Function:

fun1<- function (dataset, plot.id, ranges2use, control)

{

m1 <- strsplit(as.character(dataset$ranges2use), ",")

dat1 <- data.frame()

m2 <- c()

row_check_mean <- c()

row_check_adj_yield <- c()

x <- length(plot.id)

for (i in (1:x)) {

m2[i] <- m1[[i]]

dat1 <- dataset[dataset$ranges %in% m2[i], ]

row_check_mean[i] <- tapply(dat1$trait, dat1$control,

mean, na.rm = TRUE)[1]

row_check_adj_yield[i] <- ifelse(control[i] == "variety",

trait[i]/dataset$row_check_mean[i], trait[i]/trait[i])

}

data.frame(dataset, row_check_adj_yield)

}

Apply function:
fun1(mydata, plot.id=mydata$plotid, ranges2use = 
mydata$rangestouse,control=mydata$linecode)

Error:

Error in m1[[i]] : subscript out of bounds

Session info:

R version 3.2.1 (2015-06-18)

Platform: i386-w64-mingw32/i386 (32-bit)

Running under: Windows 7 x64 (build 7601) Service Pack 1



locale:

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252

[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252



attached base packages:

[1] stats graphics  grDevices utils datasets  methods   base



loaded via a namespace (and not attached):

 [1] magrittr_1.5plyr_1.8.3  tools_3.2.1 reshape2_1.4.1  
Rcpp_0.12.1   

[R] spatial adjustment using checks

2015-10-22 Thread DIGHE, NILESH [AG/2362]
Hi,
I have yield data for several varieties and a randomly placed check (1 in every 
8 column or "cols") in a field test arranged in a rows*cols grid format (see 
image attached).  Both "rows" & "cols" are variables in the data set.  I like 
to adjust "yield" variable for each row listed as "variety" in variable 
"linecode" by dividing its yield with the average yield of four nearest "check" 
(on the rows*cols field grid) in variable "linecode".  I like to have two 
checks on the same row where one check is on the left and the other is on the 
right side of a given variety.  The other two checks should come from the two 
neighboring columns ("cols").  If a check is missing on one or more sides of a 
given variety, then I like to proceed with the calculation with only the 
available checks around that given variety.  If two checks on the neighboring 
column are equidistance from a given variety then use position of the variety 
to choose which one to use (If variety is in cols 1-8 then use check from those 
cols; if variety is in cols 9-16 then use check from cols 9-16).

Below is the function I wrote which adjust yield values for each "variety" 
(variable "linecode") by dividing its yield with the average yield of all 
checks in the field.  Instead of using average check across the whole field, I 
like to use the four neighboring checks to make this adjustment.  I am 
struggling with specifying the four nearest checks in this loop.  I played 
around using "dist" function but without any success.  I tried searching for 
any packages that can do these nearest check adjustments without any success.  
Any help will be appreciated.

---function--
function (dataset, trait, control) {
m <- c()
x <- length(trait)
chkmean <- tapply(trait, control, mean, na.rm = T)
for (i in 1:x) {
m[i] <- ifelse(control[i] == "variety", trait[i]/chkmean[1],
trait[i]/trait[i])
}
head(as.data.frame(m))
}

-data--

dput(dat)

structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,

1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,

3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,

4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3",

"4"), class = "factor"), cols = structure(c(1L, 2L, 3L, 4L, 5L,

6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 16L, 15L,

14L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L,

1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L,

15L, 16L, 16L, 15L, 14L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L,

5L, 4L, 3L, 2L, 1L), .Label = c("1", "2", "3", "4", "5", "6",

"7", "8", "9", "10", "11", "12", "13", "14", "15", "16"), class = "factor"),

plotid = c(289L, 290L, 291L, 292L, 293L, 294L, 295L, 296L,

297L, 298L, 299L, 300L, 301L, 302L, 303L, 304L, 369L, 370L,

371L, 372L, 373L, 374L, 375L, 376L, 377L, 378L, 379L, 380L,

381L, 382L, 383L, 384L, 385L, 386L, 387L, 388L, 389L, 390L,

391L, 392L, 393L, 394L, 395L, 396L, 397L, 398L, 399L, 400L,

465L, 466L, 467L, 468L, 469L, 470L, 471L, 472L, 473L, 474L,

475L, 476L, 477L, 478L, 479L, 480L), yield = c(5.1, 5.5,

5, 5.5, 6.2, 5.1, 5.5, 5.2, 5, 5, 3.9, 4.6, 5, 4.4, 5.1,

4.3, 4.4, 4.2, 3.9, 4.6, 4.8, 5.4, 4.7, 5.5, 5.3, 4.8, 5.8,

4.6, 5.8, 5.5, 5.3, 5.6, 5.6, 5, 4.8, 4.9, 5.2, 5.3, 4.6,

4.8, 5.3, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5, 5.4, 4.8, 4.6,

5.2, 4.9, 5.1, 4.5, 5.8, 5.2, 4.7, 4.8, 5.3, 5.8, 4.9, 5.9,

4.5), line = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,

9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L,

20L, 1L, 21L, 22L, 1L, 23L, 24L, 25L, 26L, 27L, 28L, 29L,

30L, 31L, 32L, 33L, 1L, 34L, 35L, 36L, 37L, 38L, 39L, 40L,

41L, 42L, 1L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 1L,

51L, 52L, 53L, 54L, 1L, 55L, 56L, 57L), .Label = c("CHK",

"V002", "V003", "V004", "V005", "V006", "V007", "V008", "V009",

"V010", "V011", "V012", "V013", "V014", "V015", "V016", "V017",

"V018", "V019", "V020", "V021", "V022", "V023", "V024", "V025",

"V026", "V027", "V028", "V029", "V030", "V031", "V032", "V033",

"V034", "V035", "V036", "V037", "V038", "V039", "V040", "V041",

"V042", "V043", "V044", "V045", "V046", "V047", "V048", "V049",

"V050", "V051", "V052", "V053", "V054", "V055", "V056", "V057"

), class = "factor"), linecode = structure(c(1L, 2L, 2L,

2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L,

2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,

2L), .Label = c("check", "variety"), class = "factor")), .Names = c("rows",

"cols", "plotid", "yield", "line", "linecode"), row.names = c(NA,

-64L)

Re: [R] help with "by" function

2015-05-12 Thread DIGHE, NILESH [AG/2362]
Jean:  Thanks a lot!!  The changes you made to the code gave me what I needed.  
I truly appreciate your time in correcting the code.
Nilesh

From: Adams, Jean [mailto:jvad...@usgs.gov]
Sent: Tuesday, May 12, 2015 2:14 PM
To: DIGHE, NILESH [AG/2362]
Cc: r-help@r-project.org
Subject: Re: [R] help with "by" function

Nilesh,

I found a couple errors in your code.  First, in your by() statement you have a 
function to operate on the selected subset of data, which you refer to as
 x
but then, in your aov statement you refer to
 data_set
not
 x

Second, your funC() statement is a function of
 trait_names
but to make sure that the name of the variable is included in the formula, I 
changed this to a character variable.

Give the code below a try and see if it works for you.

Jean


library(agricolae)
library(dplyr)

funC <- function(trait_names){
  by(data_set, data_set$Isopair, function(x) {
  mod <- aov(formula(paste(trait_names, "~ STGgroup*Field + Rep%in%Field")),
data=x)
  out <- HSD.test(mod, "STGgroup", group=TRUE, console=TRUE)
  dfout <- arrange(data.frame(out$groups), desc(trt))
  })
}

funC("Trait1")


On Tue, May 12, 2015 at 1:03 PM, DIGHE, NILESH [AG/2362] 
mailto:nilesh.di...@monsanto.com>> wrote:
Hi,
I have an anonymous function called function(x) that will run anova, run 
HSD.test on the model, and then sort the results.  I am passing this anonymous 
function to the "by" function to get results by "Isopair" factor which is my 
index variable.  Since I want to run the anova on multiple dependent variables 
including "Trait1" & "Trait2", I am calling the dependent variable, 
"trait_names" in the model and then passing the "by" function to another 
function called "funC" which takes the "trait_names" as an argument to execute.

After I execute the funC(data_set$trait1), I am getting the results by Isopair 
but the results for the two Isopairs (Isopair-A &Isopair-B) are the same which 
I know is not correct.  It looks like the data is not getting split by Isopairs 
and so ALL data is used in anova for both Isopairs.  Any help in modifying 
function, funC or any other ways to achieve the desired outcome will be highly 
appreciated.

Thanks.  Nilesh

R code, data set, and session info is pasted below.
R code:
library(agricolae)
funC<- function(trait_names){
  by(data_set, data_set$Isopair,function(x){
  mod<- aov(trait_names~ STGgroup*Field + Rep%in%Field, data=data_set)
  out<-HSD.test(mod,"STGgroup",group=TRUE,console=TRUE)
  dfout<- arrange(data.frame(out$groups),desc(trt))
  })
}
Results:
##execute funC function for Trait1 & Trait2
funC(data_set$Trait1)


data_set$Isopair: Isopair-A

  trtmeans M

1 STG 776.9167 a

2 Non-STG 779.0833 a

---

data_set$Isopair: Isopair-B

  trtmeans M

1 STG 776.9167 a

2 Non-STG 779.0833 a

Data:
data_set<- structure(list(Field = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("LML6", "TZL2"), class = "factor"), Isopair = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Isopair-A", "Isopair-B"
), class = "factor"), STGgroup = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L), .Label = c("Non-STG", "STG"), class = "factor"),
Rep = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("Rep1",
"Rep2", "Rep3"), class = "factor"), Trait1 = c(686L, 641L,
642L, 727L, 619L, 562L, 808L, 739L, 744L, 873L, 797L, 868L,
782L, 783L, 675L, 713L, 762L, 641L, 1009L, 995L, 845L, 1186L,
912L, 663L), Trait2 = c(45L, 65L, 70L, 35L, 20L, 80L, 70L,
65L, 70L, 20L, 30L, 35L, 40L, 55L, 35L, 40L, 35L, 40L, 40L,
35L, 25L, 40L, 35L, 25L)), .Names = c("Field", "Isopair",
"STGgroup", "Rep", "Trait1", "Trait2"), class = "data.frame", row.names = c(NA,
-24L))

Session info:

R version 3.1.3 (2015-03-09)

Platform: i386-w64-mingw32/i386 (32-bit)

Running under: Windows 7 x64 (build 7601) Service Pack 1



locale:

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252

[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252



attached base packages:

[1] grid  stats graphics  grDevices utils datasets  methods   base



other attached packages:

 [1] gridExtra_0.9.1 Hmisc_3.16-0Formula_1.2-1   survival_2.38-1 
caret_6.0-41ggplot2_1.0.1

[R] help with "by" function

2015-05-12 Thread DIGHE, NILESH [AG/2362]
Hi,
I have an anonymous function called function(x) that will run anova, run 
HSD.test on the model, and then sort the results.  I am passing this anonymous 
function to the "by" function to get results by "Isopair" factor which is my 
index variable.  Since I want to run the anova on multiple dependent variables 
including "Trait1" & "Trait2", I am calling the dependent variable, 
"trait_names" in the model and then passing the "by" function to another 
function called "funC" which takes the "trait_names" as an argument to execute.

After I execute the funC(data_set$trait1), I am getting the results by Isopair 
but the results for the two Isopairs (Isopair-A &Isopair-B) are the same which 
I know is not correct.  It looks like the data is not getting split by Isopairs 
and so ALL data is used in anova for both Isopairs.  Any help in modifying 
function, funC or any other ways to achieve the desired outcome will be highly 
appreciated.

Thanks.  Nilesh

R code, data set, and session info is pasted below.
R code:
library(agricolae)
funC<- function(trait_names){
  by(data_set, data_set$Isopair,function(x){
  mod<- aov(trait_names~ STGgroup*Field + Rep%in%Field, data=data_set)
  out<-HSD.test(mod,"STGgroup",group=TRUE,console=TRUE)
  dfout<- arrange(data.frame(out$groups),desc(trt))
  })
}
Results:
##execute funC function for Trait1 & Trait2
funC(data_set$Trait1)


data_set$Isopair: Isopair-A

  trtmeans M

1 STG 776.9167 a

2 Non-STG 779.0833 a

---

data_set$Isopair: Isopair-B

  trtmeans M

1 STG 776.9167 a

2 Non-STG 779.0833 a

Data:
data_set<- structure(list(Field = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("LML6", "TZL2"), class = "factor"), Isopair = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Isopair-A", "Isopair-B"
), class = "factor"), STGgroup = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L), .Label = c("Non-STG", "STG"), class = "factor"),
Rep = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("Rep1",
"Rep2", "Rep3"), class = "factor"), Trait1 = c(686L, 641L,
642L, 727L, 619L, 562L, 808L, 739L, 744L, 873L, 797L, 868L,
782L, 783L, 675L, 713L, 762L, 641L, 1009L, 995L, 845L, 1186L,
912L, 663L), Trait2 = c(45L, 65L, 70L, 35L, 20L, 80L, 70L,
65L, 70L, 20L, 30L, 35L, 40L, 55L, 35L, 40L, 35L, 40L, 40L,
35L, 25L, 40L, 35L, 25L)), .Names = c("Field", "Isopair",
"STGgroup", "Rep", "Trait1", "Trait2"), class = "data.frame", row.names = c(NA,
-24L))

Session info:

R version 3.1.3 (2015-03-09)

Platform: i386-w64-mingw32/i386 (32-bit)

Running under: Windows 7 x64 (build 7601) Service Pack 1



locale:

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252

[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252



attached base packages:

[1] grid  stats graphics  grDevices utils datasets  methods   base



other attached packages:

 [1] gridExtra_0.9.1 Hmisc_3.16-0Formula_1.2-1   survival_2.38-1 
caret_6.0-41ggplot2_1.0.1

 [7] lattice_0.20-30 MASS_7.3-39 dplyr_0.4.1 agricolae_1.2-1



loaded via a namespace (and not attached):

 [1] acepack_1.3-3.3 assertthat_0.1  boot_1.3-15 
BradleyTerry2_1.0-6

 [5] brglm_0.5-9 car_2.0-25  cluster_2.0.1   coda_0.17-1

 [9] codetools_0.2-10colorspace_1.2-6combinat_0.0-8  DBI_0.3.1

[13] deldir_0.1-9digest_0.6.8foreach_1.4.2   foreign_0.8-63

[17] gtable_0.1.2gtools_3.4.1iterators_1.0.7 klaR_0.6-12

[21] latticeExtra_0.6-26 lazyeval_0.1.10 LearnBayes_2.15 lme4_1.1-7

[25] magrittr_1.5Matrix_1.1-5mgcv_1.8-4  minqa_1.2.4

[29] munsell_0.4.2   nlme_3.1-120nloptr_1.0.4nnet_7.3-9

[33] parallel_3.1.3  pbkrtest_0.4-2  plyr_1.8.1  proto_0.3-10

[37] quantreg_5.11   RColorBrewer_1.1-2  Rcpp_0.11.6 reshape2_1.4.1

[41] rpart_4.1-9 scales_0.2.4sp_1.0-17   SparseM_1.6

[45] spdep_0.5-88splines_3.1.3   stringr_0.6.2   tools_3.1.3





This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is