Re: [R] selecting dataframe columns based on substring of col name(s)

Evan Cooch Thu, 22 Jun 2017 05:47:49 -0700

Thanks to all the good suggestions/solutions to the original problem.

On 6/21/2017 3:28 PM, David Winsemius wrote:
>> On Jun 21, 2017, at 9:11 AM, Evan Cooch <evan.co...@gmail.com> wrote:
>>
>> Suppose I have the following sort of dataframe, where each column name has a 
>> common structure: prefix, followed by a number (for this example, col1, 
>> col2, col3 and col4):
>>
>> d = data.frame( col1=runif(10), col2=runif(10), 
>> col3=runif(10),col4=runif(10))
>>
>> What I haven't been able to suss out is how to efficiently 
>> 'extract/manipulate/play with' columns from the data frame, making use of 
>> this common structure.
>>
>> Suppose, for example, I want to 'work with' col2, col3, and col4. Now, I 
>> could subset the dataframe d in any number of ways -- for example
>>
>> piece <- d[,c("col2","col3","col4")]
>>
>> Works as expected, but for *big* problems (where I might have dozens -> 
>> hundreds of columns -- often the case with big design matrices output by 
>> some linear models program or another), having to write them all out using 
>> c("col2","col3",...."colXXXXX") takes a lot of time. What I'm wondering 
>> about is if there is a way to simply select over the "changing part" of the 
>> column name (you can do this relatively easily in a data step in SAS, for 
>> example). Heuristically, something like:
>>
>> piece <- df[,col2:col4]
>>
>> where the heuristic col2:col4 is interpreted as col2 -> col4 (parse the 
>> prefix 'col', and then simply select over the changing suffic -- i.e., 
>> column number).
>>
>> Now, if I use the "to" function in the lessR package, I can get there from 
>> here fairly easily:
>>
>> piece <- d[,to("col",4,from=2,same.size=FALSE)]
>>
>> But, is there a better way? Beyond 'efficiency' (ease of implementation), 
>> part of what constitutes 'better' might be something in base R, rather than 
>> relying on a package?
> After staring at the code for the base function subset with a thought to 
> hacking it to do this I realized that should be already part of the 
> evaluation result from its current form:
>
>   names(airquality)
> #[1] "Ozone"   "Solar.R" "Wind"    "Temp"    "Month"   "Day"
>
> subset(airquality,
>            Temp > 90,             # this is the row selection
>            select = Ozone:Solar.R) # and this selects columns
> #--------
>      Ozone Solar.R
> 42     NA     259
> 43     NA     250
> 69     97     267
> 70     97     272
> 75     NA     291
> 102    NA     222
> 120    76     203
> 121   118     225
> 122    84     237
> 123    85     188
> 124    96     167
> 125    78     197
> 126    73     183
> 127    91     189
>
> Bert's advice to work with the numbers is good, but conversion to numeric 
> designations of columns inside the `select`-expression is actually what is 
> occurring inside `subset`.
>



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] selecting dataframe columns based on substring of col name(s)

Reply via email to