Re: [R] problem applying the same function twice
The key to your problem may be that x-apply(missing,1,genRows) converts 'missing' to a matrix, with the same type for all columns then makes x either a list or a matrix but never a data.frame. Those features of apply may mess up the rest of your calculations. Don't use apply(). Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Mar 10, 2015 at 12:43 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Hey everyone, I've written a function that adds NAs to a dataframe where data is missing and it seems to work great if I only need to run it once, but if I run it two times in a row I run into problems. I've created a workable example to explain what I mean and why I would do this. In my dataframe there are areas where I need to add two rows of NAs (b/c I need to have 3 animal x year combos and for cat in year 2 I only have one) so I thought that I'd just run my code twice using the function in the code below. Everything works great when I run it the first time, but when I run it again it says that the value returned to the list 'x' is of length 0. I don't understand why the function works the first time around and adds an NA to the 'animalMass' column, but won't do it again. I've used (print(str(dataframe)) to see if there is a change in class or type when the function runs through the original dataframe and there is for 'animalYears', but I just convert it back before rerunning the function for second time. Any thoughts on this would be greatly appreciated b/c my actual data dataframe I have to input into WinBUGS is 14000x12, so it's not a trivial thing to just add in an NA here or there. comAn animals animalYears animalMass 1 bird 1 29 2 bird 1 48 3 bird 1 36 4 bird 2 20 5 bird 2 34 6 bird 2 34 7 dog 1 21 8 dog 1 28 9 dog 1 25 10 dog 2 35 11 dog 2 18 12 dog 2 11 13 cat 1 46 14 cat 1 33 15 cat 1 48 16 cat 2 21 So every animal has 3 measurements per year, except for the cat in year two which has only 1. I run the code below and get: #combs defines the different combinations of #animals and animalYears combs-paste(comAn$animals,comAn$animalYears,sep=':') #counts defines how long the different combinations are counts-ave(1:nrow(comAn),combs,FUN=length) #missing defines the combs that have length less than one and puts it in #the data frame missing missing-data.frame(vals=combs[counts2],count=counts[counts2]) genRows-function(dat){ vals-strsplit(dat[1],':')[[1]] #not sure why dat[2] is being converted to a string newRows-2-as.numeric(dat[2]) newDf-data.frame(animals=rep(vals[1],newRows), animalYears=rep(vals[2],newRows), animalMass=rep(NA,newRows)) return(newDf) } x-apply(missing,1,genRows) comAn=rbind(comAn, do.call(rbind,x)) comAn animals animalYears animalMass 1 bird 1 29 2 bird 1 48 3 bird 1 36 4 bird 2 20 5 bird 2 34 6 bird 2 34 7 dog 1 21 8 dog 1 28 9 dog 1 25 10 dog 2 35 11 dog 2 18 12 dog 2 11 13 cat 1 46 14 cat 1 33 15 cat 1 48 16 cat 2 21 17 cat 2 NA So far so good, but then I adjust the code so that it reads (**notice the change in the specification in 'missing' to counts3**): #combs defines the different combinations of #animals and animalYears combs-paste(comAn$animals,comAn$animalYears,sep=':') #counts defines how long the different combinations are counts-ave(1:nrow(comAn),combs,FUN=length) #missing defines the combs that have length less than one and puts it in #the data frame missing missing-data.frame(vals=combs[counts3],count=counts[counts3]) genRows-function(dat){ vals-strsplit(dat[1],':')[[1]] #not sure why dat[2] is being converted to a string newRows-2-as.numeric(dat[2]) newDf-data.frame(animals=rep(vals[1],newRows), animalYears=rep(vals[2],newRows), animalMass=rep(NA,newRows)) return(newDf) } x-apply(missing,1,genRows) comAn=rbind(comAn, do.call(rbind,x)) The result for 'x' then reads: x [[1]] [1] animals animalYears animalMass 0 rows (or 0-length row.names) Any thoughts on why it might be doing
Re: [R] problem applying the same function twice
Sarah, This strategy works great for this small dataset, but when I attempt your method with my data set I reach the maximum allowable memory allocation and the operation just stalls and then stops completely before it is finished. Do you know of a way around this? Thanks On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, I didn't work through your code, because it looked overly complicated. Here's a more general approach that does what you appear to want: # use dput() to provide reproducible data please! comAn - structure(list(animals = c(bird, bird, bird, bird, bird, bird, dog, dog, dog, dog, dog, dog, cat, cat, cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L, 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L )), .Names = c(animals, animalYears, animalMass), class = data.frame, row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)) # add reps to comAn # assumes comAn is already sorted on animals, animalYears comAn$reps - unlist(sapply(rle(do.call(paste, comAn[,1:2]))$lengths, seq_len)) # create full set of combinations outgrid - expand.grid(animals=unique(comAn$animals), animalYears=unique(comAn$animalYears), reps=unique(comAn$reps), stringsAsFactors=FALSE) # combine with comAn comAn.full - merge(outgrid, comAn, all.x=TRUE) comAn.full animals animalYears reps animalMass 1 bird 11 29 2 bird 12 48 3 bird 13 36 4 bird 21 20 5 bird 22 34 6 bird 23 34 7 cat 11 46 8 cat 12 33 9 cat 13 48 10 cat 21 21 11 cat 22 NA 12 cat 23 NA 13 dog 11 21 14 dog 12 28 15 dog 13 25 16 dog 21 35 17 dog 22 18 18 dog 23 11 On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Hey everyone, I've written a function that adds NAs to a dataframe where data is missing and it seems to work great if I only need to run it once, but if I run it two times in a row I run into problems. I've created a workable example to explain what I mean and why I would do this. In my dataframe there are areas where I need to add two rows of NAs (b/c I need to have 3 animal x year combos and for cat in year 2 I only have one) so I thought that I'd just run my code twice using the function in the code below. Everything works great when I run it the first time, but when I run it again it says that the value returned to the list 'x' is of length 0. I don't understand why the function works the first time around and adds an NA to the 'animalMass' column, but won't do it again. I've used (print(str(dataframe)) to see if there is a change in class or type when the function runs through the original dataframe and there is for 'animalYears', but I just convert it back before rerunning the function for second time. Any thoughts on this would be greatly appreciated b/c my actual data dataframe I have to input into WinBUGS is 14000x12, so it's not a trivial thing to just add in an NA here or there. comAn animals animalYears animalMass 1 bird 1 29 2 bird 1 48 3 bird 1 36 4 bird 2 20 5 bird 2 34 6 bird 2 34 7 dog 1 21 8 dog 1 28 9 dog 1 25 10 dog 2 35 11 dog 2 18 12 dog 2 11 13 cat 1 46 14 cat 1 33 15 cat 1 48 16 cat 2 21 So every animal has 3 measurements per year, except for the cat in year two which has only 1. I run the code below and get: #combs defines the different combinations of #animals and animalYears combs-paste(comAn$animals,comAn$animalYears,sep=':') #counts defines how long the different combinations are counts-ave(1:nrow(comAn),combs,FUN=length) #missing defines the combs that have length less than one and puts it in #the data frame missing missing-data.frame(vals=combs[counts2],count=counts[counts2]) genRows-function(dat){ vals-strsplit(dat[1],':')[[1]] #not sure why dat[2] is being converted to a string newRows-2-as.numeric(dat[2]) newDf-data.frame(animals=rep(vals[1],newRows), animalYears=rep(vals[2],newRows),
Re: [R] problem applying the same function twice
Sarah, I realized what I was saying after I pressed send on the email. It makes perfect sense now, thanks so much for your help and patience. On Mar 10, 2015 5:57 PM, Sarah Goslee sarah.gos...@gmail.com wrote: I think you're kind of missing the way this works: the data frame created by expand.grid() should ONLY have site, year, sample (with the exact names used in the data itself). Then the merged data frame will have the full site,year,sample combinations, along with ALL the data variables. Your animal example only had one measured variable, but the same method will work with any number. Reading ?merge might help you understand. Sarah On Tue, Mar 10, 2015 at 5:35 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Thanks Sarah, one of my column names was missing a letter so it was throwing things off. It works super fast now and is exactly what I needed. My actual data set has about 6 other ancillary response data data columns, is there a way to combine the 'full' data set I just created with the original in case I need any of the other response variables. E.g. FULL: Original: Combined: siteyear samplesiteyear sample color shape siteyear sample color shape 11 10 11 10 blue diamond 11 10blue diamond 1 112 1 112 green pyramid 1 112green pyramid 1 1NA 1 1NA NANA Thanks On Tue, Mar 10, 2015 at 3:12 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Yeah, that's tiny: fullout - expand.grid(site=1:669, year=1:7, sample=1:3) dim(fullout) [1] 14049 3 Almost certainly the problem is that your expand.grid result doesn't have the same column names as your actual data file, so merge() is trying to make an enormous result. Note how when I made outgrid in the example I named the columns. Make sure that the names are identical! On Tue, Mar 10, 2015 at 4:57 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Sarah, I have 669 sites and each site has 7 years of data, so if I'm thinking correctly then there should be 4683 possible combinations of site x year. For each year though I need 3 sampling periods so that there is something like the following: site 1 year1 sample 1 site 1 year1 sample 2 site 1 year1 sample 3 site 2 year1 sample 1 site 2 year1 sample 2 site 2 year1 sample 3. site 669 year7 sample 1 site 669 year7 sample 2 site 669 year7 sample 3. I have my max memory allocation set to the amount of RAM (8GB) on my laptop, but it still 'times out' due to memory problems. On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee sarah.gos...@gmail.com wrote: You said your data only had 14000 rows, which really isn't many. How many possible combinations do you have, and how many do you need to add? On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Sarah, This strategy works great for this small dataset, but when I attempt your method with my data set I reach the maximum allowable memory allocation and the operation just stalls and then stops completely before it is finished. Do you know of a way around this? Thanks On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, I didn't work through your code, because it looked overly complicated. Here's a more general approach that does what you appear to want: # use dput() to provide reproducible data please! comAn - structure(list(animals = c(bird, bird, bird, bird, bird, bird, dog, dog, dog, dog, dog, dog, cat, cat, cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L, 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L )), .Names = c(animals, animalYears, animalMass), class = data.frame, row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)) # add reps to comAn # assumes comAn is already sorted on animals, animalYears comAn$reps - unlist(sapply(rle(do.call(paste, comAn[,1:2]))$lengths, seq_len)) # create full set of combinations outgrid - expand.grid(animals=unique(comAn$animals), animalYears=unique(comAn$animalYears), reps=unique(comAn$reps), stringsAsFactors=FALSE) # combine with comAn comAn.full - merge(outgrid, comAn, all.x=TRUE) comAn.full animals animalYears
Re: [R] problem applying the same function twice
Hi, I didn't work through your code, because it looked overly complicated. Here's a more general approach that does what you appear to want: # use dput() to provide reproducible data please! comAn - structure(list(animals = c(bird, bird, bird, bird, bird, bird, dog, dog, dog, dog, dog, dog, cat, cat, cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L, 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L )), .Names = c(animals, animalYears, animalMass), class = data.frame, row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)) # add reps to comAn # assumes comAn is already sorted on animals, animalYears comAn$reps - unlist(sapply(rle(do.call(paste, comAn[,1:2]))$lengths, seq_len)) # create full set of combinations outgrid - expand.grid(animals=unique(comAn$animals), animalYears=unique(comAn$animalYears), reps=unique(comAn$reps), stringsAsFactors=FALSE) # combine with comAn comAn.full - merge(outgrid, comAn, all.x=TRUE) comAn.full animals animalYears reps animalMass 1 bird 11 29 2 bird 12 48 3 bird 13 36 4 bird 21 20 5 bird 22 34 6 bird 23 34 7 cat 11 46 8 cat 12 33 9 cat 13 48 10 cat 21 21 11 cat 22 NA 12 cat 23 NA 13 dog 11 21 14 dog 12 28 15 dog 13 25 16 dog 21 35 17 dog 22 18 18 dog 23 11 On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Hey everyone, I've written a function that adds NAs to a dataframe where data is missing and it seems to work great if I only need to run it once, but if I run it two times in a row I run into problems. I've created a workable example to explain what I mean and why I would do this. In my dataframe there are areas where I need to add two rows of NAs (b/c I need to have 3 animal x year combos and for cat in year 2 I only have one) so I thought that I'd just run my code twice using the function in the code below. Everything works great when I run it the first time, but when I run it again it says that the value returned to the list 'x' is of length 0. I don't understand why the function works the first time around and adds an NA to the 'animalMass' column, but won't do it again. I've used (print(str(dataframe)) to see if there is a change in class or type when the function runs through the original dataframe and there is for 'animalYears', but I just convert it back before rerunning the function for second time. Any thoughts on this would be greatly appreciated b/c my actual data dataframe I have to input into WinBUGS is 14000x12, so it's not a trivial thing to just add in an NA here or there. comAn animals animalYears animalMass 1 bird 1 29 2 bird 1 48 3 bird 1 36 4 bird 2 20 5 bird 2 34 6 bird 2 34 7 dog 1 21 8 dog 1 28 9 dog 1 25 10 dog 2 35 11 dog 2 18 12 dog 2 11 13 cat 1 46 14 cat 1 33 15 cat 1 48 16 cat 2 21 So every animal has 3 measurements per year, except for the cat in year two which has only 1. I run the code below and get: #combs defines the different combinations of #animals and animalYears combs-paste(comAn$animals,comAn$animalYears,sep=':') #counts defines how long the different combinations are counts-ave(1:nrow(comAn),combs,FUN=length) #missing defines the combs that have length less than one and puts it in #the data frame missing missing-data.frame(vals=combs[counts2],count=counts[counts2]) genRows-function(dat){ vals-strsplit(dat[1],':')[[1]] #not sure why dat[2] is being converted to a string newRows-2-as.numeric(dat[2]) newDf-data.frame(animals=rep(vals[1],newRows), animalYears=rep(vals[2],newRows), animalMass=rep(NA,newRows)) return(newDf) } x-apply(missing,1,genRows) comAn=rbind(comAn, do.call(rbind,x)) comAn animals animalYears animalMass 1 bird 1 29 2 bird 1 48 3 bird 1 36 4 bird 2 20 5 bird 2 34 6 bird 2 34 7 dog 1 21 8 dog 1
Re: [R] problem applying the same function twice
William, You say not to use apply here, but what would you use in its place? Thanks On Tue, Mar 10, 2015 at 2:13 PM, William Dunlap wdun...@tibco.com wrote: The key to your problem may be that x-apply(missing,1,genRows) converts 'missing' to a matrix, with the same type for all columns then makes x either a list or a matrix but never a data.frame. Those features of apply may mess up the rest of your calculations. Don't use apply(). Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Mar 10, 2015 at 12:43 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Hey everyone, I've written a function that adds NAs to a dataframe where data is missing and it seems to work great if I only need to run it once, but if I run it two times in a row I run into problems. I've created a workable example to explain what I mean and why I would do this. In my dataframe there are areas where I need to add two rows of NAs (b/c I need to have 3 animal x year combos and for cat in year 2 I only have one) so I thought that I'd just run my code twice using the function in the code below. Everything works great when I run it the first time, but when I run it again it says that the value returned to the list 'x' is of length 0. I don't understand why the function works the first time around and adds an NA to the 'animalMass' column, but won't do it again. I've used (print(str(dataframe)) to see if there is a change in class or type when the function runs through the original dataframe and there is for 'animalYears', but I just convert it back before rerunning the function for second time. Any thoughts on this would be greatly appreciated b/c my actual data dataframe I have to input into WinBUGS is 14000x12, so it's not a trivial thing to just add in an NA here or there. comAn animals animalYears animalMass 1 bird 1 29 2 bird 1 48 3 bird 1 36 4 bird 2 20 5 bird 2 34 6 bird 2 34 7 dog 1 21 8 dog 1 28 9 dog 1 25 10 dog 2 35 11 dog 2 18 12 dog 2 11 13 cat 1 46 14 cat 1 33 15 cat 1 48 16 cat 2 21 So every animal has 3 measurements per year, except for the cat in year two which has only 1. I run the code below and get: #combs defines the different combinations of #animals and animalYears combs-paste(comAn$animals,comAn$animalYears,sep=':') #counts defines how long the different combinations are counts-ave(1:nrow(comAn),combs,FUN=length) #missing defines the combs that have length less than one and puts it in #the data frame missing missing-data.frame(vals=combs[counts2],count=counts[counts2]) genRows-function(dat){ vals-strsplit(dat[1],':')[[1]] #not sure why dat[2] is being converted to a string newRows-2-as.numeric(dat[2]) newDf-data.frame(animals=rep(vals[1],newRows), animalYears=rep(vals[2],newRows), animalMass=rep(NA,newRows)) return(newDf) } x-apply(missing,1,genRows) comAn=rbind(comAn, do.call(rbind,x)) comAn animals animalYears animalMass 1 bird 1 29 2 bird 1 48 3 bird 1 36 4 bird 2 20 5 bird 2 34 6 bird 2 34 7 dog 1 21 8 dog 1 28 9 dog 1 25 10 dog 2 35 11 dog 2 18 12 dog 2 11 13 cat 1 46 14 cat 1 33 15 cat 1 48 16 cat 2 21 17 cat 2 NA So far so good, but then I adjust the code so that it reads (**notice the change in the specification in 'missing' to counts3**): #combs defines the different combinations of #animals and animalYears combs-paste(comAn$animals,comAn$animalYears,sep=':') #counts defines how long the different combinations are counts-ave(1:nrow(comAn),combs,FUN=length) #missing defines the combs that have length less than one and puts it in #the data frame missing missing-data.frame(vals=combs[counts3],count=counts[counts3]) genRows-function(dat){ vals-strsplit(dat[1],':')[[1]] #not sure why dat[2] is being converted to a string newRows-2-as.numeric(dat[2]) newDf-data.frame(animals=rep(vals[1],newRows), animalYears=rep(vals[2],newRows), animalMass=rep(NA,newRows)) return(newDf) } x-apply(missing,1,genRows) comAn=rbind(comAn,
Re: [R] problem applying the same function twice
Sarah, I have 669 sites and each site has 7 years of data, so if I'm thinking correctly then there should be 4683 possible combinations of site x year. For each year though I need 3 sampling periods so that there is something like the following: site 1 year1 sample 1 site 1 year1 sample 2 site 1 year1 sample 3 site 2 year1 sample 1 site 2 year1 sample 2 site 2 year1 sample 3. site 669 year7 sample 1 site 669 year7 sample 2 site 669 year7 sample 3. I have my max memory allocation set to the amount of RAM (8GB) on my laptop, but it still 'times out' due to memory problems. On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee sarah.gos...@gmail.com wrote: You said your data only had 14000 rows, which really isn't many. How many possible combinations do you have, and how many do you need to add? On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Sarah, This strategy works great for this small dataset, but when I attempt your method with my data set I reach the maximum allowable memory allocation and the operation just stalls and then stops completely before it is finished. Do you know of a way around this? Thanks On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, I didn't work through your code, because it looked overly complicated. Here's a more general approach that does what you appear to want: # use dput() to provide reproducible data please! comAn - structure(list(animals = c(bird, bird, bird, bird, bird, bird, dog, dog, dog, dog, dog, dog, cat, cat, cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L, 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L )), .Names = c(animals, animalYears, animalMass), class = data.frame, row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)) # add reps to comAn # assumes comAn is already sorted on animals, animalYears comAn$reps - unlist(sapply(rle(do.call(paste, comAn[,1:2]))$lengths, seq_len)) # create full set of combinations outgrid - expand.grid(animals=unique(comAn$animals), animalYears=unique(comAn$animalYears), reps=unique(comAn$reps), stringsAsFactors=FALSE) # combine with comAn comAn.full - merge(outgrid, comAn, all.x=TRUE) comAn.full animals animalYears reps animalMass 1 bird 11 29 2 bird 12 48 3 bird 13 36 4 bird 21 20 5 bird 22 34 6 bird 23 34 7 cat 11 46 8 cat 12 33 9 cat 13 48 10 cat 21 21 11 cat 22 NA 12 cat 23 NA 13 dog 11 21 14 dog 12 28 15 dog 13 25 16 dog 21 35 17 dog 22 18 18 dog 23 11 On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Hey everyone, I've written a function that adds NAs to a dataframe where data is missing and it seems to work great if I only need to run it once, but if I run it two times in a row I run into problems. I've created a workable example to explain what I mean and why I would do this. In my dataframe there are areas where I need to add two rows of NAs (b/c I need to have 3 animal x year combos and for cat in year 2 I only have one) so I thought that I'd just run my code twice using the function in the code below. Everything works great when I run it the first time, but when I run it again it says that the value returned to the list 'x' is of length 0. I don't understand why the function works the first time around and adds an NA to the 'animalMass' column, but won't do it again. I've used (print(str(dataframe)) to see if there is a change in class or type when the function runs through the original dataframe and there is for 'animalYears', but I just convert it back before rerunning the function for second time. Any thoughts on this would be greatly appreciated b/c my actual data dataframe I have to input into WinBUGS is 14000x12, so it's not a trivial thing to just add in an NA here or there. comAn animals animalYears animalMass 1 bird 1 29 2 bird 1 48 3 bird 1 36 4 bird 2 20 5 bird 2 34 6 bird 2 34 7 dog 1 21 8 dog 1 28 9
Re: [R] problem applying the same function twice
Yeah, that's tiny: fullout - expand.grid(site=1:669, year=1:7, sample=1:3) dim(fullout) [1] 14049 3 Almost certainly the problem is that your expand.grid result doesn't have the same column names as your actual data file, so merge() is trying to make an enormous result. Note how when I made outgrid in the example I named the columns. Make sure that the names are identical! On Tue, Mar 10, 2015 at 4:57 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Sarah, I have 669 sites and each site has 7 years of data, so if I'm thinking correctly then there should be 4683 possible combinations of site x year. For each year though I need 3 sampling periods so that there is something like the following: site 1 year1 sample 1 site 1 year1 sample 2 site 1 year1 sample 3 site 2 year1 sample 1 site 2 year1 sample 2 site 2 year1 sample 3. site 669 year7 sample 1 site 669 year7 sample 2 site 669 year7 sample 3. I have my max memory allocation set to the amount of RAM (8GB) on my laptop, but it still 'times out' due to memory problems. On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee sarah.gos...@gmail.com wrote: You said your data only had 14000 rows, which really isn't many. How many possible combinations do you have, and how many do you need to add? On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Sarah, This strategy works great for this small dataset, but when I attempt your method with my data set I reach the maximum allowable memory allocation and the operation just stalls and then stops completely before it is finished. Do you know of a way around this? Thanks On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, I didn't work through your code, because it looked overly complicated. Here's a more general approach that does what you appear to want: # use dput() to provide reproducible data please! comAn - structure(list(animals = c(bird, bird, bird, bird, bird, bird, dog, dog, dog, dog, dog, dog, cat, cat, cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L, 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L )), .Names = c(animals, animalYears, animalMass), class = data.frame, row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)) # add reps to comAn # assumes comAn is already sorted on animals, animalYears comAn$reps - unlist(sapply(rle(do.call(paste, comAn[,1:2]))$lengths, seq_len)) # create full set of combinations outgrid - expand.grid(animals=unique(comAn$animals), animalYears=unique(comAn$animalYears), reps=unique(comAn$reps), stringsAsFactors=FALSE) # combine with comAn comAn.full - merge(outgrid, comAn, all.x=TRUE) comAn.full animals animalYears reps animalMass 1 bird 11 29 2 bird 12 48 3 bird 13 36 4 bird 21 20 5 bird 22 34 6 bird 23 34 7 cat 11 46 8 cat 12 33 9 cat 13 48 10 cat 21 21 11 cat 22 NA 12 cat 23 NA 13 dog 11 21 14 dog 12 28 15 dog 13 25 16 dog 21 35 17 dog 22 18 18 dog 23 11 On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Hey everyone, I've written a function that adds NAs to a dataframe where data is missing and it seems to work great if I only need to run it once, but if I run it two times in a row I run into problems. I've created a workable example to explain what I mean and why I would do this. In my dataframe there are areas where I need to add two rows of NAs (b/c I need to have 3 animal x year combos and for cat in year 2 I only have one) so I thought that I'd just run my code twice using the function in the code below. Everything works great when I run it the first time, but when I run it again it says that the value returned to the list 'x' is of length 0. I don't understand why the function works the first time around and adds an NA to the 'animalMass' column, but won't do it again. I've used (print(str(dataframe)) to see if there is a change in class or type when the function runs through the original dataframe and there is for 'animalYears', but I just convert it back before rerunning the function for second time. Any thoughts on this would be
[R] problem applying the same function twice
Hey everyone, I've written a function that adds NAs to a dataframe where data is missing and it seems to work great if I only need to run it once, but if I run it two times in a row I run into problems. I've created a workable example to explain what I mean and why I would do this. In my dataframe there are areas where I need to add two rows of NAs (b/c I need to have 3 animal x year combos and for cat in year 2 I only have one) so I thought that I'd just run my code twice using the function in the code below. Everything works great when I run it the first time, but when I run it again it says that the value returned to the list 'x' is of length 0. I don't understand why the function works the first time around and adds an NA to the 'animalMass' column, but won't do it again. I've used (print(str(dataframe)) to see if there is a change in class or type when the function runs through the original dataframe and there is for 'animalYears', but I just convert it back before rerunning the function for second time. Any thoughts on this would be greatly appreciated b/c my actual data dataframe I have to input into WinBUGS is 14000x12, so it's not a trivial thing to just add in an NA here or there. comAn animals animalYears animalMass 1 bird 1 29 2 bird 1 48 3 bird 1 36 4 bird 2 20 5 bird 2 34 6 bird 2 34 7 dog 1 21 8 dog 1 28 9 dog 1 25 10 dog 2 35 11 dog 2 18 12 dog 2 11 13 cat 1 46 14 cat 1 33 15 cat 1 48 16 cat 2 21 So every animal has 3 measurements per year, except for the cat in year two which has only 1. I run the code below and get: #combs defines the different combinations of #animals and animalYears combs-paste(comAn$animals,comAn$animalYears,sep=':') #counts defines how long the different combinations are counts-ave(1:nrow(comAn),combs,FUN=length) #missing defines the combs that have length less than one and puts it in #the data frame missing missing-data.frame(vals=combs[counts2],count=counts[counts2]) genRows-function(dat){ vals-strsplit(dat[1],':')[[1]] #not sure why dat[2] is being converted to a string newRows-2-as.numeric(dat[2]) newDf-data.frame(animals=rep(vals[1],newRows), animalYears=rep(vals[2],newRows), animalMass=rep(NA,newRows)) return(newDf) } x-apply(missing,1,genRows) comAn=rbind(comAn, do.call(rbind,x)) comAn animals animalYears animalMass 1 bird 1 29 2 bird 1 48 3 bird 1 36 4 bird 2 20 5 bird 2 34 6 bird 2 34 7 dog 1 21 8 dog 1 28 9 dog 1 25 10 dog 2 35 11 dog 2 18 12 dog 2 11 13 cat 1 46 14 cat 1 33 15 cat 1 48 16 cat 2 21 17 cat 2 NA So far so good, but then I adjust the code so that it reads (**notice the change in the specification in 'missing' to counts3**): #combs defines the different combinations of #animals and animalYears combs-paste(comAn$animals,comAn$animalYears,sep=':') #counts defines how long the different combinations are counts-ave(1:nrow(comAn),combs,FUN=length) #missing defines the combs that have length less than one and puts it in #the data frame missing missing-data.frame(vals=combs[counts3],count=counts[counts3]) genRows-function(dat){ vals-strsplit(dat[1],':')[[1]] #not sure why dat[2] is being converted to a string newRows-2-as.numeric(dat[2]) newDf-data.frame(animals=rep(vals[1],newRows), animalYears=rep(vals[2],newRows), animalMass=rep(NA,newRows)) return(newDf) } x-apply(missing,1,genRows) comAn=rbind(comAn, do.call(rbind,x)) The result for 'x' then reads: x [[1]] [1] animals animalYears animalMass 0 rows (or 0-length row.names) Any thoughts on why it might be doing this instead of adding an additional row to get the result: comAn animals animalYears animalMass 1 bird 1 29 2 bird 1 48 3 bird 1 36 4 bird 2 20 5 bird 2 34 6 bird 2 34 7 dog 1 21 8 dog 1 28 9 dog 1 25 10 dog 2 35 11 dog 2 18 12 dog 2 11 13 cat
Re: [R] problem applying the same function twice
You may find it beneficial to investigate packages dplyr, data.table, or a combination of the two for handling large data sets in memory. Or, perhaps dplyr with a SQL back end for working on disk (I have not tried that myself yet). I do find your excuse for manufacturing data records uncompelling, though. Of the information necessary to draw valid conclusions is absent, the results you obtain by doing so is going to be questionable at best. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On March 10, 2015 1:57:14 PM PDT, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Sarah, I have 669 sites and each site has 7 years of data, so if I'm thinking correctly then there should be 4683 possible combinations of site x year. For each year though I need 3 sampling periods so that there is something like the following: site 1 year1 sample 1 site 1 year1 sample 2 site 1 year1 sample 3 site 2 year1 sample 1 site 2 year1 sample 2 site 2 year1 sample 3. site 669 year7 sample 1 site 669 year7 sample 2 site 669 year7 sample 3. I have my max memory allocation set to the amount of RAM (8GB) on my laptop, but it still 'times out' due to memory problems. On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee sarah.gos...@gmail.com wrote: You said your data only had 14000 rows, which really isn't many. How many possible combinations do you have, and how many do you need to add? On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Sarah, This strategy works great for this small dataset, but when I attempt your method with my data set I reach the maximum allowable memory allocation and the operation just stalls and then stops completely before it is finished. Do you know of a way around this? Thanks On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, I didn't work through your code, because it looked overly complicated. Here's a more general approach that does what you appear to want: # use dput() to provide reproducible data please! comAn - structure(list(animals = c(bird, bird, bird, bird, bird, bird, dog, dog, dog, dog, dog, dog, cat, cat, cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L, 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L )), .Names = c(animals, animalYears, animalMass), class = data.frame, row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)) # add reps to comAn # assumes comAn is already sorted on animals, animalYears comAn$reps - unlist(sapply(rle(do.call(paste, comAn[,1:2]))$lengths, seq_len)) # create full set of combinations outgrid - expand.grid(animals=unique(comAn$animals), animalYears=unique(comAn$animalYears), reps=unique(comAn$reps), stringsAsFactors=FALSE) # combine with comAn comAn.full - merge(outgrid, comAn, all.x=TRUE) comAn.full animals animalYears reps animalMass 1 bird 11 29 2 bird 12 48 3 bird 13 36 4 bird 21 20 5 bird 22 34 6 bird 23 34 7 cat 11 46 8 cat 12 33 9 cat 13 48 10 cat 21 21 11 cat 22 NA 12 cat 23 NA 13 dog 11 21 14 dog 12 28 15 dog 13 25 16 dog 21 35 17 dog 22 18 18 dog 23 11 On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Hey everyone, I've written a function that adds NAs to a dataframe where data is missing and it seems to work great if I only need to run it once, but if I run it two times in a row I run into problems. I've created a workable example to explain what I mean and why I would do this. In my dataframe there are areas where I need to add two rows of NAs (b/c I need to have 3 animal x year combos and for cat in year 2 I only have one) so I thought that I'd just run my code twice using the function in the code
Re: [R] problem applying the same function twice
Thanks Sarah, one of my column names was missing a letter so it was throwing things off. It works super fast now and is exactly what I needed. My actual data set has about 6 other ancillary response data data columns, is there a way to combine the 'full' data set I just created with the original in case I need any of the other response variables. E.g. FULL: Original: Combined: siteyear samplesiteyear sample color shape siteyear sample color shape 11 10 11 10 blue diamond 11 10blue diamond 1 112 1 112 green pyramid 1 112green pyramid 1 1NA 1 1NA NANA Thanks On Tue, Mar 10, 2015 at 3:12 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Yeah, that's tiny: fullout - expand.grid(site=1:669, year=1:7, sample=1:3) dim(fullout) [1] 14049 3 Almost certainly the problem is that your expand.grid result doesn't have the same column names as your actual data file, so merge() is trying to make an enormous result. Note how when I made outgrid in the example I named the columns. Make sure that the names are identical! On Tue, Mar 10, 2015 at 4:57 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Sarah, I have 669 sites and each site has 7 years of data, so if I'm thinking correctly then there should be 4683 possible combinations of site x year. For each year though I need 3 sampling periods so that there is something like the following: site 1 year1 sample 1 site 1 year1 sample 2 site 1 year1 sample 3 site 2 year1 sample 1 site 2 year1 sample 2 site 2 year1 sample 3. site 669 year7 sample 1 site 669 year7 sample 2 site 669 year7 sample 3. I have my max memory allocation set to the amount of RAM (8GB) on my laptop, but it still 'times out' due to memory problems. On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee sarah.gos...@gmail.com wrote: You said your data only had 14000 rows, which really isn't many. How many possible combinations do you have, and how many do you need to add? On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Sarah, This strategy works great for this small dataset, but when I attempt your method with my data set I reach the maximum allowable memory allocation and the operation just stalls and then stops completely before it is finished. Do you know of a way around this? Thanks On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, I didn't work through your code, because it looked overly complicated. Here's a more general approach that does what you appear to want: # use dput() to provide reproducible data please! comAn - structure(list(animals = c(bird, bird, bird, bird, bird, bird, dog, dog, dog, dog, dog, dog, cat, cat, cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L, 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L )), .Names = c(animals, animalYears, animalMass), class = data.frame, row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)) # add reps to comAn # assumes comAn is already sorted on animals, animalYears comAn$reps - unlist(sapply(rle(do.call(paste, comAn[,1:2]))$lengths, seq_len)) # create full set of combinations outgrid - expand.grid(animals=unique(comAn$animals), animalYears=unique(comAn$animalYears), reps=unique(comAn$reps), stringsAsFactors=FALSE) # combine with comAn comAn.full - merge(outgrid, comAn, all.x=TRUE) comAn.full animals animalYears reps animalMass 1 bird 11 29 2 bird 12 48 3 bird 13 36 4 bird 21 20 5 bird 22 34 6 bird 23 34 7 cat 11 46 8 cat 12 33 9 cat 13 48 10 cat 21 21 11 cat 22 NA 12 cat 23 NA 13 dog 11 21 14 dog 12 28 15 dog 13 25 16 dog 21 35 17 dog 22 18 18 dog 23 11 On Tue, Mar 10, 2015 at 3:43
Re: [R] problem applying the same function twice
I think you're kind of missing the way this works: the data frame created by expand.grid() should ONLY have site, year, sample (with the exact names used in the data itself). Then the merged data frame will have the full site,year,sample combinations, along with ALL the data variables. Your animal example only had one measured variable, but the same method will work with any number. Reading ?merge might help you understand. Sarah On Tue, Mar 10, 2015 at 5:35 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Thanks Sarah, one of my column names was missing a letter so it was throwing things off. It works super fast now and is exactly what I needed. My actual data set has about 6 other ancillary response data data columns, is there a way to combine the 'full' data set I just created with the original in case I need any of the other response variables. E.g. FULL: Original: Combined: siteyear samplesiteyear sample color shape siteyear sample color shape 11 10 11 10 blue diamond 11 10blue diamond 1 112 1 112 green pyramid 1 112green pyramid 1 1NA 1 1NA NANA Thanks On Tue, Mar 10, 2015 at 3:12 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Yeah, that's tiny: fullout - expand.grid(site=1:669, year=1:7, sample=1:3) dim(fullout) [1] 14049 3 Almost certainly the problem is that your expand.grid result doesn't have the same column names as your actual data file, so merge() is trying to make an enormous result. Note how when I made outgrid in the example I named the columns. Make sure that the names are identical! On Tue, Mar 10, 2015 at 4:57 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Sarah, I have 669 sites and each site has 7 years of data, so if I'm thinking correctly then there should be 4683 possible combinations of site x year. For each year though I need 3 sampling periods so that there is something like the following: site 1 year1 sample 1 site 1 year1 sample 2 site 1 year1 sample 3 site 2 year1 sample 1 site 2 year1 sample 2 site 2 year1 sample 3. site 669 year7 sample 1 site 669 year7 sample 2 site 669 year7 sample 3. I have my max memory allocation set to the amount of RAM (8GB) on my laptop, but it still 'times out' due to memory problems. On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee sarah.gos...@gmail.com wrote: You said your data only had 14000 rows, which really isn't many. How many possible combinations do you have, and how many do you need to add? On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter curtisburkhal...@gmail.com wrote: Sarah, This strategy works great for this small dataset, but when I attempt your method with my data set I reach the maximum allowable memory allocation and the operation just stalls and then stops completely before it is finished. Do you know of a way around this? Thanks On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, I didn't work through your code, because it looked overly complicated. Here's a more general approach that does what you appear to want: # use dput() to provide reproducible data please! comAn - structure(list(animals = c(bird, bird, bird, bird, bird, bird, dog, dog, dog, dog, dog, dog, cat, cat, cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L, 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L )), .Names = c(animals, animalYears, animalMass), class = data.frame, row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)) # add reps to comAn # assumes comAn is already sorted on animals, animalYears comAn$reps - unlist(sapply(rle(do.call(paste, comAn[,1:2]))$lengths, seq_len)) # create full set of combinations outgrid - expand.grid(animals=unique(comAn$animals), animalYears=unique(comAn$animalYears), reps=unique(comAn$reps), stringsAsFactors=FALSE) # combine with comAn comAn.full - merge(outgrid, comAn, all.x=TRUE) comAn.full animals animalYears reps animalMass 1 bird 11 29 2 bird 12 48 3 bird 13 36 4 bird 21 20 5 bird 22 34 6 bird 23 34 7 cat 11 46 8 cat 12 33 9