David, In data.table v1.9.5 (current development version, which you can get from here: https://github.com/Rdatatable/data.table/wiki/Installation), new features were added to both `melt` and `cast` for data.tables. They both can handle multiple columns simultaneously. I think this would be of interest for you..
Using 1.9.5, here's how I'd do it. require(data.table) ## v1.9.5+ cols <- grep("^da2.*$", names(bw), value=TRUE) ## (1) splt <- split(cols, seq_len(length(cols)/2L)) ## (2) vars <- unique(gsub("(.*?)_(.*$)", "\\1", cols)) ## (3) vals <- unique(gsub("(.*?)_(.*$)", "\\2", cols)) ## (4) ans1 = melt(setDT(bw), measure=splt, variable.name="disc", value.name=vals) ## (5) setattr(ans1$disc, 'levels', vars) ## (6) Explanation: ------------------- 1. Get all cols you've to melt 2. Split them into column pairs that should be combined together 3. Get levels for 'variable' column 4. Get column names for molten result 5. Melt by providing list of columns with each element containing the columns you'd want to combine together in the molten result directly. 6. Set levels for variable column appropriately. Advantages: ------------------ 1. melting by combining corresponding columns together, directly, is straightforward and easy to understand, since that's the task you want to perform. Having to combine all columns together and then split them back seems roundabout. 2. "casting" (tidyr::spread internally uses reshape2::dcast) is a relatively complicated operation, and in this case it can be completely avoided which will save both time and memory (see benchmark at the bottom of post). It also reorders the result which may not be desirable. 3. In 'bw', columns `da20_dev_type` and `da2_dev_type` are type 'factor' while others are type 'numeric'. reshape2::melt (or) tidyr::gather, since it combines all columns will have to coerce these different types to a common type, here 'character'. So, you'll have to convert the columns back to the right type after casting. I think you'll agree that's unnecessary. `melt.data.table` preserves the type as it combines only relevant columns together. 4. Since the operation is performed in a straightforward manner (and in C for speed), it's incredibly fast *and* memory efficient. Benchmark (on ~180,000 rows) --------------------------------------------- library(tidyr) library(dplyr) require(data.table) ## v1.9.5+ # replacing timestamp so that rows for unique (for spread to work correctly) bw.large = rbindlist(replicate(1e4, bw, simplify=FALSE))[, timestamp := .I][] object.size(bw.large)/1024^2 # ~38MB The data is 38MB, which is not at all large... but enough to illustrate. # data.table system.time({ cols <- grep("^da2.*$", names(bw), value=TRUE) ## (1) splt <- split(cols, seq_len(length(cols)/2L)) ## (2) vars <- unique(gsub("(.*?)_(.*$)", "\\1", cols)) ## (3) vals <- unique(gsub("(.*?)_(.*$)", "\\2", cols)) ## (4) ans1 = melt(setDT(bw.large), measure=splt, variable.name="disc", value.name=vals) ## (5) setattr(ans1$disc, 'levels', vars) ## (6) }) # user system elapsed # 0.260 0.013 0.275 Memory used: 56MB # tidyr system.time({ ans2 <- gather(setDF(bw.large), key = "tmp", value = "value", matches("^d[a-z]+[0-9]+")) ans2 <- separate(ans2, tmp, c("disc", "var"), "_", extra = "merge") ans2 <- spread(ans2, var, value) }) # user system elapsed # 15.818 1.128 17.063 Memory used : 750MB And that's ~62x speedup. HTH Arun Srinivasan Co-developer, data.table. On Tue, Mar 31, 2015 at 8:35 PM, Tom Wright <t...@maladmin.com> wrote: > Nice clean-up!!! > > On Tue, 2015-03-31 at 14:19 -0400, Ista Zahn wrote: >> library(tidyr) >> library(dplyr) >> bw <- gather(bw, key = "tmp", value = "value", >> matches("^d[a-z]+[0-9]+")) >> bw <- separate(bw, tmp, c("disc", "var"), "_", extra = "merge") >> bw <- spread(bw, var, value) > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.