Re: [R] problem applying the same function twice

2015-03-12 Thread William Dunlap
The key to your problem may be that
   x-apply(missing,1,genRows)
converts 'missing' to a matrix, with the same type for all columns
then makes x either a list or a matrix but never a data.frame.
Those features of apply may mess up the rest of your calculations.

Don't use apply().


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Mar 10, 2015 at 12:43 PM, Curtis Burkhalter 
curtisburkhal...@gmail.com wrote:

 Hey everyone,

 I've written a function that adds NAs to a dataframe where data is missing
 and it seems to work great if I only need to run it once, but if I run it
 two times in a row I run into problems. I've created a workable example to
 explain what I mean and why I would do this.

 In my dataframe there are areas where I need to add two rows of NAs (b/c I
 need to have 3 animal x year combos and for cat in year 2 I only have one)
 so I thought that I'd just run my code twice using the function in the code
 below. Everything works great when I run it the first time, but when I run
 it again it says that the value returned to the list 'x' is of length 0. I
 don't understand why the function works the first time around and adds an
 NA to the 'animalMass' column, but won't do it again. I've used
 (print(str(dataframe)) to see if there is a change in class or type when
 the function runs through the original dataframe and there is for
 'animalYears', but I just convert it back before rerunning the function for
 second time.

 Any thoughts on this would be greatly appreciated b/c my actual data
 dataframe I have to input into WinBUGS is 14000x12, so it's not a trivial
 thing to just add in an NA here or there.

 comAn
animals animalYears animalMass
 1 bird   1 29
 2 bird   1 48
 3 bird   1 36
 4 bird   2 20
 5 bird   2 34
 6 bird   2 34
 7  dog   1 21
 8  dog   1 28
 9  dog   1 25
 10 dog   2 35
 11 dog   2 18
 12 dog   2 11
 13 cat   1 46
 14 cat   1 33
 15 cat   1 48
 16 cat   2 21

 So every animal has 3 measurements per year, except for the cat in year two
 which has only 1. I run the code below and get:

 #combs defines the different combinations of
 #animals and animalYears
 combs-paste(comAn$animals,comAn$animalYears,sep=':')
 #counts defines how long the different combinations are
 counts-ave(1:nrow(comAn),combs,FUN=length)
 #missing defines the combs that have length less than one and puts it in
 #the data frame missing
 missing-data.frame(vals=combs[counts2],count=counts[counts2])

 genRows-function(dat){
 vals-strsplit(dat[1],':')[[1]]
 #not sure why dat[2] is being converted to a string
 newRows-2-as.numeric(dat[2])
 newDf-data.frame(animals=rep(vals[1],newRows),
   animalYears=rep(vals[2],newRows),
   animalMass=rep(NA,newRows))
 return(newDf)
 }


 x-apply(missing,1,genRows)
 comAn=rbind(comAn,
 do.call(rbind,x))

  comAn
animals animalYears animalMass
 1 bird   1 29
 2 bird   1 48
 3 bird   1 36
 4 bird   2 20
 5 bird   2 34
 6 bird   2 34
 7  dog   1 21
 8  dog   1 28
 9  dog   1 25
 10 dog   2 35
 11 dog   2 18
 12 dog   2 11
 13 cat   1 46
 14 cat   1 33
 15 cat   1 48
 16 cat   2 21
 17 cat   2   NA

 So far so good, but then I adjust the code so that it reads (**notice the
 change in the specification in 'missing' to counts3**):

 #combs defines the different combinations of
 #animals and animalYears
 combs-paste(comAn$animals,comAn$animalYears,sep=':')
 #counts defines how long the different combinations are
 counts-ave(1:nrow(comAn),combs,FUN=length)
 #missing defines the combs that have length less than one and puts it in
 #the data frame missing
 missing-data.frame(vals=combs[counts3],count=counts[counts3])

 genRows-function(dat){
 vals-strsplit(dat[1],':')[[1]]
 #not sure why dat[2] is being converted to a string
 newRows-2-as.numeric(dat[2])
 newDf-data.frame(animals=rep(vals[1],newRows),
   animalYears=rep(vals[2],newRows),
   animalMass=rep(NA,newRows))
 return(newDf)
 }


 x-apply(missing,1,genRows)
 comAn=rbind(comAn,
 do.call(rbind,x))

 The result for 'x' then reads:

  x
 [[1]]
 [1] animals animalYears animalMass
 0 rows (or 0-length row.names)

 Any thoughts on why it might be doing 

Re: [R] problem applying the same function twice

2015-03-12 Thread Curtis Burkhalter
Sarah,

This strategy works great for this small dataset, but when I attempt your
method with my data set I reach the maximum allowable memory allocation and
the operation just stalls and then stops completely before it is finished.
Do you know of a way around this?

Thanks

On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee sarah.gos...@gmail.com
wrote:

 Hi,

 I didn't work through your code, because it looked overly complicated.
 Here's a more general approach that does what you appear to want:

 # use dput() to provide reproducible data please!
 comAn - structure(list(animals = c(bird, bird, bird, bird, bird,
 bird, dog, dog, dog, dog, dog, dog, cat, cat,
 cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L,
 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L
 )), .Names = c(animals, animalYears, animalMass), class =
 data.frame, row.names = c(1,
 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
 14, 15, 16))


 # add reps to comAn
 # assumes comAn is already sorted on animals, animalYears
 comAn$reps - unlist(sapply(rle(do.call(paste,
 comAn[,1:2]))$lengths, seq_len))

 # create full set of combinations
 outgrid - expand.grid(animals=unique(comAn$animals),
 animalYears=unique(comAn$animalYears), reps=unique(comAn$reps),
 stringsAsFactors=FALSE)

 # combine with comAn
 comAn.full - merge(outgrid, comAn, all.x=TRUE)

  comAn.full
animals animalYears reps animalMass
 1 bird   11 29
 2 bird   12 48
 3 bird   13 36
 4 bird   21 20
 5 bird   22 34
 6 bird   23 34
 7  cat   11 46
 8  cat   12 33
 9  cat   13 48
 10 cat   21 21
 11 cat   22 NA
 12 cat   23 NA
 13 dog   11 21
 14 dog   12 28
 15 dog   13 25
 16 dog   21 35
 17 dog   22 18
 18 dog   23 11
 

 On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter
 curtisburkhal...@gmail.com wrote:
  Hey everyone,
 
  I've written a function that adds NAs to a dataframe where data is
 missing
  and it seems to work great if I only need to run it once, but if I run it
  two times in a row I run into problems. I've created a workable example
 to
  explain what I mean and why I would do this.
 
  In my dataframe there are areas where I need to add two rows of NAs (b/c
 I
  need to have 3 animal x year combos and for cat in year 2 I only have
 one)
  so I thought that I'd just run my code twice using the function in the
 code
  below. Everything works great when I run it the first time, but when I
 run
  it again it says that the value returned to the list 'x' is of length 0.
 I
  don't understand why the function works the first time around and adds an
  NA to the 'animalMass' column, but won't do it again. I've used
  (print(str(dataframe)) to see if there is a change in class or type when
  the function runs through the original dataframe and there is for
  'animalYears', but I just convert it back before rerunning the function
 for
  second time.
 
  Any thoughts on this would be greatly appreciated b/c my actual data
  dataframe I have to input into WinBUGS is 14000x12, so it's not a trivial
  thing to just add in an NA here or there.
 
 comAn
 animals animalYears animalMass
  1 bird   1 29
  2 bird   1 48
  3 bird   1 36
  4 bird   2 20
  5 bird   2 34
  6 bird   2 34
  7  dog   1 21
  8  dog   1 28
  9  dog   1 25
  10 dog   2 35
  11 dog   2 18
  12 dog   2 11
  13 cat   1 46
  14 cat   1 33
  15 cat   1 48
  16 cat   2 21
 
  So every animal has 3 measurements per year, except for the cat in year
 two
  which has only 1. I run the code below and get:
 
  #combs defines the different combinations of
  #animals and animalYears
  combs-paste(comAn$animals,comAn$animalYears,sep=':')
  #counts defines how long the different combinations are
  counts-ave(1:nrow(comAn),combs,FUN=length)
  #missing defines the combs that have length less than one and puts it in
  #the data frame missing
  missing-data.frame(vals=combs[counts2],count=counts[counts2])
 
  genRows-function(dat){
  vals-strsplit(dat[1],':')[[1]]
  #not sure why dat[2] is being converted to a string
  newRows-2-as.numeric(dat[2])
  newDf-data.frame(animals=rep(vals[1],newRows),
animalYears=rep(vals[2],newRows),

Re: [R] problem applying the same function twice

2015-03-10 Thread Curtis Burkhalter
Sarah,

I realized what I was saying after I pressed send on the email. It makes
perfect sense now, thanks so much for your help and patience.
On Mar 10, 2015 5:57 PM, Sarah Goslee sarah.gos...@gmail.com wrote:

 I think you're kind of missing the way this works:

 the data frame created by expand.grid() should ONLY have site, year,
 sample (with the exact names used in the data itself).
 Then the merged data frame will have the full site,year,sample
 combinations, along with ALL the data variables. Your animal example
 only had one measured variable, but the same method will work with any
 number.
 Reading ?merge might help you understand.

 Sarah

 On Tue, Mar 10, 2015 at 5:35 PM, Curtis Burkhalter
 curtisburkhal...@gmail.com wrote:
 
  Thanks Sarah, one of my column names was missing a letter so it was
 throwing
  things off. It works super fast now and is exactly what I needed. My
 actual
  data set  has about 6 other ancillary response data data columns, is
 there a
  way to combine the 'full' data set I just created with the original in
 case
  I need any of the other response variables. E.g.
 
  FULL:  Original:
  Combined:
  siteyear samplesiteyear sample
  color
  shape  siteyear sample color shape
  11 10   11 10
  blue   diamond  11 10blue
  diamond
  1 112   1 112
  green pyramid   1 112green
  pyramid
  1 1NA
  1 1NA   NANA
 
  Thanks
 
  On Tue, Mar 10, 2015 at 3:12 PM, Sarah Goslee sarah.gos...@gmail.com
  wrote:
 
  Yeah, that's tiny:
 
   fullout - expand.grid(site=1:669, year=1:7, sample=1:3)
   dim(fullout)
  [1] 14049 3
 
 
  Almost certainly the problem is that your expand.grid result doesn't
  have the same column names as your actual data file, so merge() is
  trying to make an enormous result. Note how when I made outgrid in the
  example I named the columns.
 
  Make sure that the names are identical!
 
 
  On Tue, Mar 10, 2015 at 4:57 PM, Curtis Burkhalter
  curtisburkhal...@gmail.com wrote:
   Sarah,
  
   I have 669 sites and each site has 7 years of data, so if I'm thinking
   correctly then there should be 4683 possible combinations of site x
   year.
   For each year though I need 3 sampling periods so that there is
   something
   like the following:
  
   site 1  year1  sample 1
   site 1  year1  sample 2
   site 1  year1  sample 3
   site 2  year1  sample 1
   site 2  year1  sample 2
   site 2  year1  sample 3.
   site 669   year7  sample 1
   site 669   year7 sample 2
   site 669   year7 sample 3.
  
   I have my max memory allocation set to the amount of RAM (8GB) on my
   laptop,
   but it still 'times out' due to memory problems.
  
   On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee sarah.gos...@gmail.com
 
   wrote:
  
   You said your data only had 14000 rows, which really isn't many.
  
   How many possible combinations do you have, and how many do you need
 to
   add?
  
   On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter
   curtisburkhal...@gmail.com wrote:
Sarah,
   
This strategy works great for this small dataset, but when I
 attempt
your
method with my data set I reach the maximum allowable memory
allocation
and
the operation just stalls and then stops completely before it is
finished.
Do you know of a way around this?
   
Thanks
   
On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee
sarah.gos...@gmail.com
wrote:
   
Hi,
   
I didn't work through your code, because it looked overly
complicated.
Here's a more general approach that does what you appear to want:
   
# use dput() to provide reproducible data please!
comAn - structure(list(animals = c(bird, bird, bird,
 bird,
bird,
bird, dog, dog, dog, dog, dog, dog, cat, cat,
cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L,
20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L
)), .Names = c(animals, animalYears, animalMass), class =
data.frame, row.names = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16))
   
   
# add reps to comAn
# assumes comAn is already sorted on animals, animalYears
comAn$reps - unlist(sapply(rle(do.call(paste,
comAn[,1:2]))$lengths, seq_len))
   
# create full set of combinations
outgrid - expand.grid(animals=unique(comAn$animals),
animalYears=unique(comAn$animalYears), reps=unique(comAn$reps),
stringsAsFactors=FALSE)
   
# combine with comAn
comAn.full - merge(outgrid, comAn, all.x=TRUE)
   
 comAn.full
   animals animalYears 

Re: [R] problem applying the same function twice

2015-03-10 Thread Sarah Goslee
Hi,

I didn't work through your code, because it looked overly complicated.
Here's a more general approach that does what you appear to want:

# use dput() to provide reproducible data please!
comAn - structure(list(animals = c(bird, bird, bird, bird, bird,
bird, dog, dog, dog, dog, dog, dog, cat, cat,
cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L,
20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L
)), .Names = c(animals, animalYears, animalMass), class =
data.frame, row.names = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16))


# add reps to comAn
# assumes comAn is already sorted on animals, animalYears
comAn$reps - unlist(sapply(rle(do.call(paste,
comAn[,1:2]))$lengths, seq_len))

# create full set of combinations
outgrid - expand.grid(animals=unique(comAn$animals),
animalYears=unique(comAn$animalYears), reps=unique(comAn$reps),
stringsAsFactors=FALSE)

# combine with comAn
comAn.full - merge(outgrid, comAn, all.x=TRUE)

 comAn.full
   animals animalYears reps animalMass
1 bird   11 29
2 bird   12 48
3 bird   13 36
4 bird   21 20
5 bird   22 34
6 bird   23 34
7  cat   11 46
8  cat   12 33
9  cat   13 48
10 cat   21 21
11 cat   22 NA
12 cat   23 NA
13 dog   11 21
14 dog   12 28
15 dog   13 25
16 dog   21 35
17 dog   22 18
18 dog   23 11


On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter
curtisburkhal...@gmail.com wrote:
 Hey everyone,

 I've written a function that adds NAs to a dataframe where data is missing
 and it seems to work great if I only need to run it once, but if I run it
 two times in a row I run into problems. I've created a workable example to
 explain what I mean and why I would do this.

 In my dataframe there are areas where I need to add two rows of NAs (b/c I
 need to have 3 animal x year combos and for cat in year 2 I only have one)
 so I thought that I'd just run my code twice using the function in the code
 below. Everything works great when I run it the first time, but when I run
 it again it says that the value returned to the list 'x' is of length 0. I
 don't understand why the function works the first time around and adds an
 NA to the 'animalMass' column, but won't do it again. I've used
 (print(str(dataframe)) to see if there is a change in class or type when
 the function runs through the original dataframe and there is for
 'animalYears', but I just convert it back before rerunning the function for
 second time.

 Any thoughts on this would be greatly appreciated b/c my actual data
 dataframe I have to input into WinBUGS is 14000x12, so it's not a trivial
 thing to just add in an NA here or there.

comAn
animals animalYears animalMass
 1 bird   1 29
 2 bird   1 48
 3 bird   1 36
 4 bird   2 20
 5 bird   2 34
 6 bird   2 34
 7  dog   1 21
 8  dog   1 28
 9  dog   1 25
 10 dog   2 35
 11 dog   2 18
 12 dog   2 11
 13 cat   1 46
 14 cat   1 33
 15 cat   1 48
 16 cat   2 21

 So every animal has 3 measurements per year, except for the cat in year two
 which has only 1. I run the code below and get:

 #combs defines the different combinations of
 #animals and animalYears
 combs-paste(comAn$animals,comAn$animalYears,sep=':')
 #counts defines how long the different combinations are
 counts-ave(1:nrow(comAn),combs,FUN=length)
 #missing defines the combs that have length less than one and puts it in
 #the data frame missing
 missing-data.frame(vals=combs[counts2],count=counts[counts2])

 genRows-function(dat){
 vals-strsplit(dat[1],':')[[1]]
 #not sure why dat[2] is being converted to a string
 newRows-2-as.numeric(dat[2])
 newDf-data.frame(animals=rep(vals[1],newRows),
   animalYears=rep(vals[2],newRows),
   animalMass=rep(NA,newRows))
 return(newDf)
 }


 x-apply(missing,1,genRows)
 comAn=rbind(comAn,
 do.call(rbind,x))

 comAn
animals animalYears animalMass
 1 bird   1 29
 2 bird   1 48
 3 bird   1 36
 4 bird   2 20
 5 bird   2 34
 6 bird   2 34
 7  dog   1 21
 8  dog   1   

Re: [R] problem applying the same function twice

2015-03-10 Thread Curtis Burkhalter
William,

You say not to use apply here, but what would you use in its place?

Thanks

On Tue, Mar 10, 2015 at 2:13 PM, William Dunlap wdun...@tibco.com wrote:

 The key to your problem may be that
x-apply(missing,1,genRows)
 converts 'missing' to a matrix, with the same type for all columns
 then makes x either a list or a matrix but never a data.frame.
 Those features of apply may mess up the rest of your calculations.

 Don't use apply().


 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com

 On Tue, Mar 10, 2015 at 12:43 PM, Curtis Burkhalter 
 curtisburkhal...@gmail.com wrote:

 Hey everyone,

 I've written a function that adds NAs to a dataframe where data is missing
 and it seems to work great if I only need to run it once, but if I run it
 two times in a row I run into problems. I've created a workable example to
 explain what I mean and why I would do this.

 In my dataframe there are areas where I need to add two rows of NAs (b/c I
 need to have 3 animal x year combos and for cat in year 2 I only have one)
 so I thought that I'd just run my code twice using the function in the
 code
 below. Everything works great when I run it the first time, but when I run
 it again it says that the value returned to the list 'x' is of length 0. I
 don't understand why the function works the first time around and adds an
 NA to the 'animalMass' column, but won't do it again. I've used
 (print(str(dataframe)) to see if there is a change in class or type when
 the function runs through the original dataframe and there is for
 'animalYears', but I just convert it back before rerunning the function
 for
 second time.

 Any thoughts on this would be greatly appreciated b/c my actual data
 dataframe I have to input into WinBUGS is 14000x12, so it's not a trivial
 thing to just add in an NA here or there.

 comAn
animals animalYears animalMass
 1 bird   1 29
 2 bird   1 48
 3 bird   1 36
 4 bird   2 20
 5 bird   2 34
 6 bird   2 34
 7  dog   1 21
 8  dog   1 28
 9  dog   1 25
 10 dog   2 35
 11 dog   2 18
 12 dog   2 11
 13 cat   1 46
 14 cat   1 33
 15 cat   1 48
 16 cat   2 21

 So every animal has 3 measurements per year, except for the cat in year
 two
 which has only 1. I run the code below and get:

 #combs defines the different combinations of
 #animals and animalYears
 combs-paste(comAn$animals,comAn$animalYears,sep=':')
 #counts defines how long the different combinations are
 counts-ave(1:nrow(comAn),combs,FUN=length)
 #missing defines the combs that have length less than one and puts it in
 #the data frame missing
 missing-data.frame(vals=combs[counts2],count=counts[counts2])

 genRows-function(dat){
 vals-strsplit(dat[1],':')[[1]]
 #not sure why dat[2] is being converted to a string
 newRows-2-as.numeric(dat[2])
 newDf-data.frame(animals=rep(vals[1],newRows),
   animalYears=rep(vals[2],newRows),
   animalMass=rep(NA,newRows))
 return(newDf)
 }


 x-apply(missing,1,genRows)
 comAn=rbind(comAn,
 do.call(rbind,x))

  comAn
animals animalYears animalMass
 1 bird   1 29
 2 bird   1 48
 3 bird   1 36
 4 bird   2 20
 5 bird   2 34
 6 bird   2 34
 7  dog   1 21
 8  dog   1 28
 9  dog   1 25
 10 dog   2 35
 11 dog   2 18
 12 dog   2 11
 13 cat   1 46
 14 cat   1 33
 15 cat   1 48
 16 cat   2 21
 17 cat   2   NA

 So far so good, but then I adjust the code so that it reads (**notice the
 change in the specification in 'missing' to counts3**):

 #combs defines the different combinations of
 #animals and animalYears
 combs-paste(comAn$animals,comAn$animalYears,sep=':')
 #counts defines how long the different combinations are
 counts-ave(1:nrow(comAn),combs,FUN=length)
 #missing defines the combs that have length less than one and puts it in
 #the data frame missing
 missing-data.frame(vals=combs[counts3],count=counts[counts3])

 genRows-function(dat){
 vals-strsplit(dat[1],':')[[1]]
 #not sure why dat[2] is being converted to a string
 newRows-2-as.numeric(dat[2])
 newDf-data.frame(animals=rep(vals[1],newRows),
   animalYears=rep(vals[2],newRows),
   animalMass=rep(NA,newRows))
 return(newDf)
 }


 x-apply(missing,1,genRows)
 comAn=rbind(comAn,
 

Re: [R] problem applying the same function twice

2015-03-10 Thread Curtis Burkhalter
Sarah,

I have 669 sites and each site has 7 years of data, so if I'm thinking
correctly then there should be 4683 possible combinations of site x year.
For each year though I need 3 sampling periods so that there is something
like the following:

site 1  year1  sample 1
site 1  year1  sample 2
site 1  year1  sample 3
site 2  year1  sample 1
site 2  year1  sample 2
site 2  year1  sample 3.
site 669   year7  sample 1
site 669   year7 sample 2
site 669   year7 sample 3.

I have my max memory allocation set to the amount of RAM (8GB) on my
laptop, but it still 'times out' due to memory problems.

On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee sarah.gos...@gmail.com
wrote:

 You said your data only had 14000 rows, which really isn't many.

 How many possible combinations do you have, and how many do you need to
 add?

 On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter
 curtisburkhal...@gmail.com wrote:
  Sarah,
 
  This strategy works great for this small dataset, but when I attempt your
  method with my data set I reach the maximum allowable memory allocation
 and
  the operation just stalls and then stops completely before it is
 finished.
  Do you know of a way around this?
 
  Thanks
 
  On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee sarah.gos...@gmail.com
  wrote:
 
  Hi,
 
  I didn't work through your code, because it looked overly complicated.
  Here's a more general approach that does what you appear to want:
 
  # use dput() to provide reproducible data please!
  comAn - structure(list(animals = c(bird, bird, bird, bird,
  bird,
  bird, dog, dog, dog, dog, dog, dog, cat, cat,
  cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
  1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L,
  20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L
  )), .Names = c(animals, animalYears, animalMass), class =
  data.frame, row.names = c(1,
  2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
  14, 15, 16))
 
 
  # add reps to comAn
  # assumes comAn is already sorted on animals, animalYears
  comAn$reps - unlist(sapply(rle(do.call(paste,
  comAn[,1:2]))$lengths, seq_len))
 
  # create full set of combinations
  outgrid - expand.grid(animals=unique(comAn$animals),
  animalYears=unique(comAn$animalYears), reps=unique(comAn$reps),
  stringsAsFactors=FALSE)
 
  # combine with comAn
  comAn.full - merge(outgrid, comAn, all.x=TRUE)
 
   comAn.full
 animals animalYears reps animalMass
  1 bird   11 29
  2 bird   12 48
  3 bird   13 36
  4 bird   21 20
  5 bird   22 34
  6 bird   23 34
  7  cat   11 46
  8  cat   12 33
  9  cat   13 48
  10 cat   21 21
  11 cat   22 NA
  12 cat   23 NA
  13 dog   11 21
  14 dog   12 28
  15 dog   13 25
  16 dog   21 35
  17 dog   22 18
  18 dog   23 11
  
 
  On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter
  curtisburkhal...@gmail.com wrote:
   Hey everyone,
  
   I've written a function that adds NAs to a dataframe where data is
   missing
   and it seems to work great if I only need to run it once, but if I run
   it
   two times in a row I run into problems. I've created a workable
 example
   to
   explain what I mean and why I would do this.
  
   In my dataframe there are areas where I need to add two rows of NAs
 (b/c
   I
   need to have 3 animal x year combos and for cat in year 2 I only have
   one)
   so I thought that I'd just run my code twice using the function in the
   code
   below. Everything works great when I run it the first time, but when I
   run
   it again it says that the value returned to the list 'x' is of length
 0.
   I
   don't understand why the function works the first time around and adds
   an
   NA to the 'animalMass' column, but won't do it again. I've used
   (print(str(dataframe)) to see if there is a change in class or type
 when
   the function runs through the original dataframe and there is for
   'animalYears', but I just convert it back before rerunning the
 function
   for
   second time.
  
   Any thoughts on this would be greatly appreciated b/c my actual data
   dataframe I have to input into WinBUGS is 14000x12, so it's not a
   trivial
   thing to just add in an NA here or there.
  
  comAn
  animals animalYears animalMass
   1 bird   1 29
   2 bird   1 48
   3 bird   1 36
   4 bird   2 20
   5 bird   2 34
   6 bird   2 34
   7  dog   1 21
   8  dog   1 28
   9  

Re: [R] problem applying the same function twice

2015-03-10 Thread Sarah Goslee
Yeah, that's tiny:

 fullout - expand.grid(site=1:669, year=1:7, sample=1:3)
 dim(fullout)
[1] 14049 3


Almost certainly the problem is that your expand.grid result doesn't
have the same column names as your actual data file, so merge() is
trying to make an enormous result. Note how when I made outgrid in the
example I named the columns.

Make sure that the names are identical!


On Tue, Mar 10, 2015 at 4:57 PM, Curtis Burkhalter
curtisburkhal...@gmail.com wrote:
 Sarah,

 I have 669 sites and each site has 7 years of data, so if I'm thinking
 correctly then there should be 4683 possible combinations of site x year.
 For each year though I need 3 sampling periods so that there is something
 like the following:

 site 1  year1  sample 1
 site 1  year1  sample 2
 site 1  year1  sample 3
 site 2  year1  sample 1
 site 2  year1  sample 2
 site 2  year1  sample 3.
 site 669   year7  sample 1
 site 669   year7 sample 2
 site 669   year7 sample 3.

 I have my max memory allocation set to the amount of RAM (8GB) on my laptop,
 but it still 'times out' due to memory problems.

 On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee sarah.gos...@gmail.com
 wrote:

 You said your data only had 14000 rows, which really isn't many.

 How many possible combinations do you have, and how many do you need to
 add?

 On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter
 curtisburkhal...@gmail.com wrote:
  Sarah,
 
  This strategy works great for this small dataset, but when I attempt
  your
  method with my data set I reach the maximum allowable memory allocation
  and
  the operation just stalls and then stops completely before it is
  finished.
  Do you know of a way around this?
 
  Thanks
 
  On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee sarah.gos...@gmail.com
  wrote:
 
  Hi,
 
  I didn't work through your code, because it looked overly complicated.
  Here's a more general approach that does what you appear to want:
 
  # use dput() to provide reproducible data please!
  comAn - structure(list(animals = c(bird, bird, bird, bird,
  bird,
  bird, dog, dog, dog, dog, dog, dog, cat, cat,
  cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
  1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L,
  20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L
  )), .Names = c(animals, animalYears, animalMass), class =
  data.frame, row.names = c(1,
  2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
  14, 15, 16))
 
 
  # add reps to comAn
  # assumes comAn is already sorted on animals, animalYears
  comAn$reps - unlist(sapply(rle(do.call(paste,
  comAn[,1:2]))$lengths, seq_len))
 
  # create full set of combinations
  outgrid - expand.grid(animals=unique(comAn$animals),
  animalYears=unique(comAn$animalYears), reps=unique(comAn$reps),
  stringsAsFactors=FALSE)
 
  # combine with comAn
  comAn.full - merge(outgrid, comAn, all.x=TRUE)
 
   comAn.full
 animals animalYears reps animalMass
  1 bird   11 29
  2 bird   12 48
  3 bird   13 36
  4 bird   21 20
  5 bird   22 34
  6 bird   23 34
  7  cat   11 46
  8  cat   12 33
  9  cat   13 48
  10 cat   21 21
  11 cat   22 NA
  12 cat   23 NA
  13 dog   11 21
  14 dog   12 28
  15 dog   13 25
  16 dog   21 35
  17 dog   22 18
  18 dog   23 11
  
 
  On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter
  curtisburkhal...@gmail.com wrote:
   Hey everyone,
  
   I've written a function that adds NAs to a dataframe where data is
   missing
   and it seems to work great if I only need to run it once, but if I
   run
   it
   two times in a row I run into problems. I've created a workable
   example
   to
   explain what I mean and why I would do this.
  
   In my dataframe there are areas where I need to add two rows of NAs
   (b/c
   I
   need to have 3 animal x year combos and for cat in year 2 I only have
   one)
   so I thought that I'd just run my code twice using the function in
   the
   code
   below. Everything works great when I run it the first time, but when
   I
   run
   it again it says that the value returned to the list 'x' is of length
   0.
   I
   don't understand why the function works the first time around and
   adds
   an
   NA to the 'animalMass' column, but won't do it again. I've used
   (print(str(dataframe)) to see if there is a change in class or type
   when
   the function runs through the original dataframe and there is for
   'animalYears', but I just convert it back before rerunning the
   function
   for
   second time.
  
   Any thoughts on this would be 

[R] problem applying the same function twice

2015-03-10 Thread Curtis Burkhalter
Hey everyone,

I've written a function that adds NAs to a dataframe where data is missing
and it seems to work great if I only need to run it once, but if I run it
two times in a row I run into problems. I've created a workable example to
explain what I mean and why I would do this.

In my dataframe there are areas where I need to add two rows of NAs (b/c I
need to have 3 animal x year combos and for cat in year 2 I only have one)
so I thought that I'd just run my code twice using the function in the code
below. Everything works great when I run it the first time, but when I run
it again it says that the value returned to the list 'x' is of length 0. I
don't understand why the function works the first time around and adds an
NA to the 'animalMass' column, but won't do it again. I've used
(print(str(dataframe)) to see if there is a change in class or type when
the function runs through the original dataframe and there is for
'animalYears', but I just convert it back before rerunning the function for
second time.

Any thoughts on this would be greatly appreciated b/c my actual data
dataframe I have to input into WinBUGS is 14000x12, so it's not a trivial
thing to just add in an NA here or there.

comAn
   animals animalYears animalMass
1 bird   1 29
2 bird   1 48
3 bird   1 36
4 bird   2 20
5 bird   2 34
6 bird   2 34
7  dog   1 21
8  dog   1 28
9  dog   1 25
10 dog   2 35
11 dog   2 18
12 dog   2 11
13 cat   1 46
14 cat   1 33
15 cat   1 48
16 cat   2 21

So every animal has 3 measurements per year, except for the cat in year two
which has only 1. I run the code below and get:

#combs defines the different combinations of
#animals and animalYears
combs-paste(comAn$animals,comAn$animalYears,sep=':')
#counts defines how long the different combinations are
counts-ave(1:nrow(comAn),combs,FUN=length)
#missing defines the combs that have length less than one and puts it in
#the data frame missing
missing-data.frame(vals=combs[counts2],count=counts[counts2])

genRows-function(dat){
vals-strsplit(dat[1],':')[[1]]
#not sure why dat[2] is being converted to a string
newRows-2-as.numeric(dat[2])
newDf-data.frame(animals=rep(vals[1],newRows),
  animalYears=rep(vals[2],newRows),
  animalMass=rep(NA,newRows))
return(newDf)
}


x-apply(missing,1,genRows)
comAn=rbind(comAn,
do.call(rbind,x))

 comAn
   animals animalYears animalMass
1 bird   1 29
2 bird   1 48
3 bird   1 36
4 bird   2 20
5 bird   2 34
6 bird   2 34
7  dog   1 21
8  dog   1 28
9  dog   1 25
10 dog   2 35
11 dog   2 18
12 dog   2 11
13 cat   1 46
14 cat   1 33
15 cat   1 48
16 cat   2 21
17 cat   2   NA

So far so good, but then I adjust the code so that it reads (**notice the
change in the specification in 'missing' to counts3**):

#combs defines the different combinations of
#animals and animalYears
combs-paste(comAn$animals,comAn$animalYears,sep=':')
#counts defines how long the different combinations are
counts-ave(1:nrow(comAn),combs,FUN=length)
#missing defines the combs that have length less than one and puts it in
#the data frame missing
missing-data.frame(vals=combs[counts3],count=counts[counts3])

genRows-function(dat){
vals-strsplit(dat[1],':')[[1]]
#not sure why dat[2] is being converted to a string
newRows-2-as.numeric(dat[2])
newDf-data.frame(animals=rep(vals[1],newRows),
  animalYears=rep(vals[2],newRows),
  animalMass=rep(NA,newRows))
return(newDf)
}


x-apply(missing,1,genRows)
comAn=rbind(comAn,
do.call(rbind,x))

The result for 'x' then reads:

 x
[[1]]
[1] animals animalYears animalMass
0 rows (or 0-length row.names)

Any thoughts on why it might be doing this instead of adding an additional
row to get the result:

 comAn
   animals animalYears animalMass
1 bird   1 29
2 bird   1 48
3 bird   1 36
4 bird   2 20
5 bird   2 34
6 bird   2 34
7  dog   1 21
8  dog   1 28
9  dog   1 25
10 dog   2 35
11 dog   2 18
12 dog   2 11
13 cat  

Re: [R] problem applying the same function twice

2015-03-10 Thread Jeff Newmiller
You may find it beneficial to investigate packages dplyr, data.table, or a 
combination of the two for handling large data sets in memory. Or, perhaps 
dplyr with a SQL back end for working on disk (I have not tried that myself 
yet).

I do find your excuse for manufacturing data records uncompelling, though. Of 
the information necessary to draw valid conclusions is absent, the results you 
obtain by doing so is going to be questionable at best.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On March 10, 2015 1:57:14 PM PDT, Curtis Burkhalter 
curtisburkhal...@gmail.com wrote:
Sarah,

I have 669 sites and each site has 7 years of data, so if I'm thinking
correctly then there should be 4683 possible combinations of site x
year.
For each year though I need 3 sampling periods so that there is
something
like the following:

site 1  year1  sample 1
site 1  year1  sample 2
site 1  year1  sample 3
site 2  year1  sample 1
site 2  year1  sample 2
site 2  year1  sample 3.
site 669   year7  sample 1
site 669   year7 sample 2
site 669   year7 sample 3.

I have my max memory allocation set to the amount of RAM (8GB) on my
laptop, but it still 'times out' due to memory problems.

On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee sarah.gos...@gmail.com
wrote:

 You said your data only had 14000 rows, which really isn't many.

 How many possible combinations do you have, and how many do you need
to
 add?

 On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter
 curtisburkhal...@gmail.com wrote:
  Sarah,
 
  This strategy works great for this small dataset, but when I
attempt your
  method with my data set I reach the maximum allowable memory
allocation
 and
  the operation just stalls and then stops completely before it is
 finished.
  Do you know of a way around this?
 
  Thanks
 
  On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee
sarah.gos...@gmail.com
  wrote:
 
  Hi,
 
  I didn't work through your code, because it looked overly
complicated.
  Here's a more general approach that does what you appear to want:
 
  # use dput() to provide reproducible data please!
  comAn - structure(list(animals = c(bird, bird, bird,
bird,
  bird,
  bird, dog, dog, dog, dog, dog, dog, cat, cat,
  cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
  1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L,
  20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L
  )), .Names = c(animals, animalYears, animalMass), class =
  data.frame, row.names = c(1,
  2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
  14, 15, 16))
 
 
  # add reps to comAn
  # assumes comAn is already sorted on animals, animalYears
  comAn$reps - unlist(sapply(rle(do.call(paste,
  comAn[,1:2]))$lengths, seq_len))
 
  # create full set of combinations
  outgrid - expand.grid(animals=unique(comAn$animals),
  animalYears=unique(comAn$animalYears), reps=unique(comAn$reps),
  stringsAsFactors=FALSE)
 
  # combine with comAn
  comAn.full - merge(outgrid, comAn, all.x=TRUE)
 
   comAn.full
 animals animalYears reps animalMass
  1 bird   11 29
  2 bird   12 48
  3 bird   13 36
  4 bird   21 20
  5 bird   22 34
  6 bird   23 34
  7  cat   11 46
  8  cat   12 33
  9  cat   13 48
  10 cat   21 21
  11 cat   22 NA
  12 cat   23 NA
  13 dog   11 21
  14 dog   12 28
  15 dog   13 25
  16 dog   21 35
  17 dog   22 18
  18 dog   23 11
  
 
  On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter
  curtisburkhal...@gmail.com wrote:
   Hey everyone,
  
   I've written a function that adds NAs to a dataframe where data
is
   missing
   and it seems to work great if I only need to run it once, but if
I run
   it
   two times in a row I run into problems. I've created a workable
 example
   to
   explain what I mean and why I would do this.
  
   In my dataframe there are areas where I need to add two rows of
NAs
 (b/c
   I
   need to have 3 animal x year combos and for cat in year 2 I only
have
   one)
   so I thought that I'd just run my code twice using the function
in the
   code
   

Re: [R] problem applying the same function twice

2015-03-10 Thread Curtis Burkhalter
Thanks Sarah, one of my column names was missing a letter so it was
throwing things off. It works super fast now and is exactly what I needed.
My actual data set  has about 6 other ancillary response data data columns,
is there a way to combine the 'full' data set I just created with the
original in case I need any of the other response variables. E.g.

FULL:  Original:
   Combined:
siteyear samplesiteyear sample
color shape  siteyear sample color
shape
11 10   11 10
 blue   diamond  11 10blue
  diamond
1 112   1 112
 green pyramid   1 112green
pyramid
1 1NA
   1 1NA
NANA

Thanks

On Tue, Mar 10, 2015 at 3:12 PM, Sarah Goslee sarah.gos...@gmail.com
wrote:

 Yeah, that's tiny:

  fullout - expand.grid(site=1:669, year=1:7, sample=1:3)
  dim(fullout)
 [1] 14049 3


 Almost certainly the problem is that your expand.grid result doesn't
 have the same column names as your actual data file, so merge() is
 trying to make an enormous result. Note how when I made outgrid in the
 example I named the columns.

 Make sure that the names are identical!


 On Tue, Mar 10, 2015 at 4:57 PM, Curtis Burkhalter
 curtisburkhal...@gmail.com wrote:
  Sarah,
 
  I have 669 sites and each site has 7 years of data, so if I'm thinking
  correctly then there should be 4683 possible combinations of site x year.
  For each year though I need 3 sampling periods so that there is something
  like the following:
 
  site 1  year1  sample 1
  site 1  year1  sample 2
  site 1  year1  sample 3
  site 2  year1  sample 1
  site 2  year1  sample 2
  site 2  year1  sample 3.
  site 669   year7  sample 1
  site 669   year7 sample 2
  site 669   year7 sample 3.
 
  I have my max memory allocation set to the amount of RAM (8GB) on my
 laptop,
  but it still 'times out' due to memory problems.
 
  On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee sarah.gos...@gmail.com
  wrote:
 
  You said your data only had 14000 rows, which really isn't many.
 
  How many possible combinations do you have, and how many do you need to
  add?
 
  On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter
  curtisburkhal...@gmail.com wrote:
   Sarah,
  
   This strategy works great for this small dataset, but when I attempt
   your
   method with my data set I reach the maximum allowable memory
 allocation
   and
   the operation just stalls and then stops completely before it is
   finished.
   Do you know of a way around this?
  
   Thanks
  
   On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee sarah.gos...@gmail.com
 
   wrote:
  
   Hi,
  
   I didn't work through your code, because it looked overly
 complicated.
   Here's a more general approach that does what you appear to want:
  
   # use dput() to provide reproducible data please!
   comAn - structure(list(animals = c(bird, bird, bird, bird,
   bird,
   bird, dog, dog, dog, dog, dog, dog, cat, cat,
   cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
   1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L,
   20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L
   )), .Names = c(animals, animalYears, animalMass), class =
   data.frame, row.names = c(1,
   2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
   14, 15, 16))
  
  
   # add reps to comAn
   # assumes comAn is already sorted on animals, animalYears
   comAn$reps - unlist(sapply(rle(do.call(paste,
   comAn[,1:2]))$lengths, seq_len))
  
   # create full set of combinations
   outgrid - expand.grid(animals=unique(comAn$animals),
   animalYears=unique(comAn$animalYears), reps=unique(comAn$reps),
   stringsAsFactors=FALSE)
  
   # combine with comAn
   comAn.full - merge(outgrid, comAn, all.x=TRUE)
  
comAn.full
  animals animalYears reps animalMass
   1 bird   11 29
   2 bird   12 48
   3 bird   13 36
   4 bird   21 20
   5 bird   22 34
   6 bird   23 34
   7  cat   11 46
   8  cat   12 33
   9  cat   13 48
   10 cat   21 21
   11 cat   22 NA
   12 cat   23 NA
   13 dog   11 21
   14 dog   12 28
   15 dog   13 25
   16 dog   21 35
   17 dog   22 18
   18 dog   23 11
   
  
   On Tue, Mar 10, 2015 at 3:43 

Re: [R] problem applying the same function twice

2015-03-10 Thread Sarah Goslee
I think you're kind of missing the way this works:

the data frame created by expand.grid() should ONLY have site, year,
sample (with the exact names used in the data itself).
Then the merged data frame will have the full site,year,sample
combinations, along with ALL the data variables. Your animal example
only had one measured variable, but the same method will work with any
number.
Reading ?merge might help you understand.

Sarah

On Tue, Mar 10, 2015 at 5:35 PM, Curtis Burkhalter
curtisburkhal...@gmail.com wrote:

 Thanks Sarah, one of my column names was missing a letter so it was throwing
 things off. It works super fast now and is exactly what I needed. My actual
 data set  has about 6 other ancillary response data data columns, is there a
 way to combine the 'full' data set I just created with the original in case
 I need any of the other response variables. E.g.

 FULL:  Original:
 Combined:
 siteyear samplesiteyear sample color
 shape  siteyear sample color shape
 11 10   11 10
 blue   diamond  11 10blue
 diamond
 1 112   1 112
 green pyramid   1 112green
 pyramid
 1 1NA
 1 1NA   NANA

 Thanks

 On Tue, Mar 10, 2015 at 3:12 PM, Sarah Goslee sarah.gos...@gmail.com
 wrote:

 Yeah, that's tiny:

  fullout - expand.grid(site=1:669, year=1:7, sample=1:3)
  dim(fullout)
 [1] 14049 3


 Almost certainly the problem is that your expand.grid result doesn't
 have the same column names as your actual data file, so merge() is
 trying to make an enormous result. Note how when I made outgrid in the
 example I named the columns.

 Make sure that the names are identical!


 On Tue, Mar 10, 2015 at 4:57 PM, Curtis Burkhalter
 curtisburkhal...@gmail.com wrote:
  Sarah,
 
  I have 669 sites and each site has 7 years of data, so if I'm thinking
  correctly then there should be 4683 possible combinations of site x
  year.
  For each year though I need 3 sampling periods so that there is
  something
  like the following:
 
  site 1  year1  sample 1
  site 1  year1  sample 2
  site 1  year1  sample 3
  site 2  year1  sample 1
  site 2  year1  sample 2
  site 2  year1  sample 3.
  site 669   year7  sample 1
  site 669   year7 sample 2
  site 669   year7 sample 3.
 
  I have my max memory allocation set to the amount of RAM (8GB) on my
  laptop,
  but it still 'times out' due to memory problems.
 
  On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee sarah.gos...@gmail.com
  wrote:
 
  You said your data only had 14000 rows, which really isn't many.
 
  How many possible combinations do you have, and how many do you need to
  add?
 
  On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter
  curtisburkhal...@gmail.com wrote:
   Sarah,
  
   This strategy works great for this small dataset, but when I attempt
   your
   method with my data set I reach the maximum allowable memory
   allocation
   and
   the operation just stalls and then stops completely before it is
   finished.
   Do you know of a way around this?
  
   Thanks
  
   On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee
   sarah.gos...@gmail.com
   wrote:
  
   Hi,
  
   I didn't work through your code, because it looked overly
   complicated.
   Here's a more general approach that does what you appear to want:
  
   # use dput() to provide reproducible data please!
   comAn - structure(list(animals = c(bird, bird, bird, bird,
   bird,
   bird, dog, dog, dog, dog, dog, dog, cat, cat,
   cat, cat), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
   1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L,
   20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L
   )), .Names = c(animals, animalYears, animalMass), class =
   data.frame, row.names = c(1,
   2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
   14, 15, 16))
  
  
   # add reps to comAn
   # assumes comAn is already sorted on animals, animalYears
   comAn$reps - unlist(sapply(rle(do.call(paste,
   comAn[,1:2]))$lengths, seq_len))
  
   # create full set of combinations
   outgrid - expand.grid(animals=unique(comAn$animals),
   animalYears=unique(comAn$animalYears), reps=unique(comAn$reps),
   stringsAsFactors=FALSE)
  
   # combine with comAn
   comAn.full - merge(outgrid, comAn, all.x=TRUE)
  
comAn.full
  animals animalYears reps animalMass
   1 bird   11 29
   2 bird   12 48
   3 bird   13 36
   4 bird   21 20
   5 bird   22 34
   6 bird   23 34
   7  cat   11 46
   8  cat   12 33
   9