On Sun, Nov 16, 2008 at 07:31:04AM -0500, John Poulsen wrote: > I have a dataset that has counts, but I need to expand the dataset so > that each of the counts has its own line in the dataset (row) and is > given and id. It looks something like: > > Site Type Cnt > 1 "A" 3 > 1 "B" 0 > 2 "C" 2 > > I want the dataset to look like: > > Site Type ID > 1 "A" 1 > 1 "A" 2 > 1 "A" 3 > 1 "B" 0 > 2 "C" 1 > 2 "C" 2 > > I can do this using loops, but I was wondering if anyone knows a more > efficient way of expanding the data on counts and giving id numbers.
The following will almost do what you want: # create example data df <- data.frame(site=c(1,1,2), type=c('A','B','C'), cnt=c(3,0,2)) # expand according to cnt column df2 <- df[rep(1:dim(df)[1], times=df$cnt), ] # generate ID column df2$ID <- unlist(tapply(df2$cnt, df2$type, function(x){1:length(x)})) # get rid of cnt column df2$cnt <- NULL There is one major difference to your example above: As Type 'B' has zero counts, it will not occur in the expanded dataset - which seems the right thing to do to me. Keeping a row for zero counts and assigning an ID of 0 is inconsitent with how positive counts are treated. But factor 'type' still has level 'B' - even though it does no longer occur in the actual data: > str(df2) 'data.frame': 5 obs. of 3 variables: $ site: num 1 1 1 2 2 $ type: Factor w/ 3 levels "A","B","C": 1 1 1 3 3 $ ID : int 1 2 3 1 2 Maybe this already solves your problem. If not: why do you want special treatment of empty categories? Maybe you can use this solution and take care of the zero counts in a different way than you had planned, originally? cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.