Hi, Thanks for your reply. Based on your suggestions, I managed to simplify the code, but only a little. I don't see how I could do without a loop, given the nestedness of the hierachy. See the code below, which is working, but I'd like to simplify it. # sample data theCodes <- c('STAT.01', 'STAT.01.01', 'STAT.01.01.01', 'STAT.01.01.02', 'STAT.01.01.03', 'STAT.01.01.04', 'STAT.01.01.05', 'STAT.01.01.06', 'STAT.01.01.06.01', 'STAT.01.01.06.02', 'STAT.01.01.06.03', 'STAT.01.01.06.04', 'STAT.01.01.06.05', 'STAT.01.01.06.06', 'STAT.01.02', 'STAT.01.02.01', 'STAT.01.02.02', 'STAT.01.02.03', 'STAT.01.02.03.01', 'STAT.01.02.03.02', 'STAT.01.02.03.03', 'STAT.01.02.03.04', 'STAT.01.02.03.05', 'STAT.01.03') theValues <- as.numeric(c(NA, NA, 15074.23366, 4882.942034, 1619.59628, 1801.722877, 1019.973666, NA, 503.9239317, 917.2189347, 6018.830465, 1944.11311, 1427.575402, 1965.725428, NA, 5857.293612, 5933.770263, NA, 6077.089518, 1427.180073, 455.9387993, 859.766603, 1002.983331, 2225.328211)) df <- as.data.frame(cbind(code=theCodes, value=theValues)) df$value <- as.numeric(df$value) # actual code getDepth <- function(df) { df$diepte <- do.call(rbind, lapply(strsplit(df$code, "\\."), length)) - 1 return(df) } getParents <- function(df) { df$parent <- substr(df$code, 1, 4 + (df$diepte - 1) * 3) return(df) } getTotals <- function(df, depth) { s <- subset(df, diepte==depth) if(!"parent" %in% names(df)) s <- getParents(s) agg <- aggregate(s["value"], s["parent"], FUN=sum, na.rm=TRUE) merged <- merge(df, agg, by.x="code", by.y="parent", all=TRUE, suffixes=c("", "_summed")) isSum <- !is.na(merged$value_summed) merged[isSum, "value"] <- merged[isSum, "value_summed"] merged$value_summed <- merged$parent <- NULL return(merged) } #library(debug) #mtrace(getTotals) df <- getDepth(df) for( depth in max(df$diepte):2 ) { if (depth == max(df$diepte)) { x <- getTotals(df, depth) } else { x <- getTotals(x, depth) } }
Cheers!! Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >________________________________ >From: "ONKELINX, Thierry" <thierry.onkel...@inbo.be> >To: Albert-Jan Roskam <fo...@yahoo.com>; R Mailing List <r-help@r-project.org> >Sent: Wednesday, November 16, 2011 2:34 PM >Subject: RE: [R] hierachical code system > >Dear Albert-Jan, > >The easiest way is to create extra variables with the corresponding >aggregation level. substr() en strsplit() can be your friends. Once you have >those variables you can use aggregate() or any other aggregating function. You >don't need loops. > >Best regards, > >Thierry > >> -----Oorspronkelijk bericht----- >> Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >> Namens Albert-Jan Roskam >> Verzonden: woensdag 16 november 2011 14:28 >> Aan: R Mailing List >> Onderwerp: [R] hierachical code system >> >> Hi, >> >> I have a hierachical code system such as the example below (the printed data >> are easiest to read). I would like to write a function that returns an >> 'imputed' >> data frame, ie. where the the parent values are calculated as the sum of the >> child values. So, for instance, STAT.01.01.06 is the sum of STAT.01.01.06.01 >> through STAT.01.01.06.06. The code I have written uses two for loops, and, >> moreover, does not work as intended. My starting point was to determine the >> code depth by counting the dots in the variable 'code' (using strsplit), then >> iterate over the tree from deep to shallow. Does anybody have a good idea as >> to how to approach this in R? >> >> theCodes <- c('STAT.01', 'STAT.01.01', 'STAT.01.01.01', 'STAT.01.01.02', >> 'STAT.01.01.03', 'STAT.01.01.04', 'STAT.01.01.05', 'STAT.01.01.06', >> 'STAT.01.01.06.01', 'STAT.01.01.06.02', 'STAT.01.01.06.03', >> 'STAT.01.01.06.04', >> 'STAT.01.01.06.05', 'STAT.01.01.06.06', 'STAT.01.02', 'STAT.01.02.01', >> 'STAT.01.02.02', 'STAT.01.02.03', 'STAT.01.02.03.01', 'STAT.01.02.03.02', >> 'STAT.01.02.03.03', 'STAT.01.02.03.04', 'STAT.01.02.03.05', 'STAT.01.03') >> theValues <- c('NA', 'NA', '15074.23366', '4882.942034', '1619.59628', >> '1801.722877', '1019.973666', 'NA', '503.9239317', '917.2189347', >> '6018.830465', '1944.11311', '1427.575402', '1965.725428', 'NA', >> '5857.293612', >> '5933.770263', '6077.089518', 'NA', '1427.180073', '455.9387993', >> '859.766603', >> '1002.983331', '2225.328211') df <- as.data.frame(cbind(code=theCodes, >> value=theValues)) >> print(df) >> code value >> 1 STAT.01 NA >> 2 STAT.01.01 NA >> 3 STAT.01.01.01 15074.23366 >> 4 STAT.01.01.02 4882.942034 >> 5 STAT.01.01.03 1619.59628 >> 6 STAT.01.01.04 1801.722877 >> 7 STAT.01.01.05 1019.973666 >> 8 STAT.01.01.06 NA >> 9 STAT.01.01.06.01 503.9239317 >> 10 STAT.01.01.06.02 917.2189347 >> 11 STAT.01.01.06.03 6018.830465 >> 12 STAT.01.01.06.04 1944.11311 >> 13 STAT.01.01.06.05 1427.575402 >> 14 STAT.01.01.06.06 1965.725428 >> 15 STAT.01.02 NA >> 16 STAT.01.02.01 5857.293612 >> 17 STAT.01.02.02 5933.770263 >> 18 STAT.01.02.03 6077.089518 >> 19 STAT.01.02.03.01 NA >> 20 STAT.01.02.03.02 1427.180073 >> 21 STAT.01.02.03.03 455.9387993 >> 22 STAT.01.02.03.04 859.766603 >> 23 STAT.01.02.03.05 1002.983331 >> 24 STAT.01.03 2225.328211 >> > >> >> Thank you in advance! >> >> Cheers!! >> Albert-Jan >> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> ~~~~~ >> All right, but apart from the sanitation, the medicine, education, wine, >> public >> order, irrigation, roads, a fresh water system, and public health, what have >> the >> Romans ever done for us? >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> ~~~~~ >> [[alternative HTML version deleted]] > > > > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.