Thanks to all the good suggestions/solutions to the original problem. On 6/21/2017 3:28 PM, David Winsemius wrote: >> On Jun 21, 2017, at 9:11 AM, Evan Cooch <evan.co...@gmail.com> wrote: >> >> Suppose I have the following sort of dataframe, where each column name has a >> common structure: prefix, followed by a number (for this example, col1, >> col2, col3 and col4): >> >> d = data.frame( col1=runif(10), col2=runif(10), >> col3=runif(10),col4=runif(10)) >> >> What I haven't been able to suss out is how to efficiently >> 'extract/manipulate/play with' columns from the data frame, making use of >> this common structure. >> >> Suppose, for example, I want to 'work with' col2, col3, and col4. Now, I >> could subset the dataframe d in any number of ways -- for example >> >> piece <- d[,c("col2","col3","col4")] >> >> Works as expected, but for *big* problems (where I might have dozens -> >> hundreds of columns -- often the case with big design matrices output by >> some linear models program or another), having to write them all out using >> c("col2","col3",...."colXXXXX") takes a lot of time. What I'm wondering >> about is if there is a way to simply select over the "changing part" of the >> column name (you can do this relatively easily in a data step in SAS, for >> example). Heuristically, something like: >> >> piece <- df[,col2:col4] >> >> where the heuristic col2:col4 is interpreted as col2 -> col4 (parse the >> prefix 'col', and then simply select over the changing suffic -- i.e., >> column number). >> >> Now, if I use the "to" function in the lessR package, I can get there from >> here fairly easily: >> >> piece <- d[,to("col",4,from=2,same.size=FALSE)] >> >> But, is there a better way? Beyond 'efficiency' (ease of implementation), >> part of what constitutes 'better' might be something in base R, rather than >> relying on a package? > After staring at the code for the base function subset with a thought to > hacking it to do this I realized that should be already part of the > evaluation result from its current form: > > names(airquality) > #[1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day" > > subset(airquality, > Temp > 90, # this is the row selection > select = Ozone:Solar.R) # and this selects columns > #-------- > Ozone Solar.R > 42 NA 259 > 43 NA 250 > 69 97 267 > 70 97 272 > 75 NA 291 > 102 NA 222 > 120 76 203 > 121 118 225 > 122 84 237 > 123 85 188 > 124 96 167 > 125 78 197 > 126 73 183 > 127 91 189 > > Bert's advice to work with the numbers is good, but conversion to numeric > designations of columns inside the `select`-expression is actually what is > occurring inside `subset`. >
[[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.