Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread avi.e.gross
John,

 

I am very familiar with the evolving tidyverse and some messages a while back 
included people who wanted this forum to mainly stick to base R, so I leave out 
examples.

 

Indeed, the tidyverse is designed to make it easy to select columns with all 
kinds of conditions including using regular expressions that allow more 
precision (as does grep) so you want to match “yr” followed by exactly one or 
two digits. Some of the answers suggest starting with “yr” was enough. They 
also allow selecting on arbitrary considerations like whether the column 
contains numeric data. You can do most things in base R, albeit I find the 
tidyverse method easier most of the time and also able to do some extremely 
complicated things with some care, such as creating multiple new columns form a 
set of columns that each implement a different function like mean, and mode and 
standard deviation and make the new columns the same names as the one they are 
derived from but a different suffix reflecting what transformation was done.

 

One nice feature is the ideas behind how data streams through multiple steps 
with one or a few transformations in each step, and the intermediate parts you 
do not want, simply melt away. The part about selecting or deselecting columns 
can often be used in many of the verbs.

 

From: John Kane  
Sent: Saturday, January 14, 2023 4:07 PM
To: avi.e.gr...@gmail.com
Cc: R-help Mailing List 
Subject: Re: [R] Removing variables from data frame with a wile card

 

You rang sir?

 

library(tidyverse)
xx = 1:10 
yr1 = yr2 = yr3 = rnorm(10)
dat1 <- data.frame(xx , yr1, yr2, y3)

 

dat1  %>%  select(!starts_with("yr"))

 

or for something a bit more exotic as I have been trying to learn a bit about 
the "data.table package

 

library(data.table)

xx = 1:10 
yr1 = yr2 = yr3 = rnorm(10)

dat2 <- data.table(xx , yr1, yr2, yr3)

dat2[, !names(dat2) %like% "yr", with=FALSE ]
 

 

 

On Sat, 14 Jan 2023 at 12:28, mailto:avi.e.gr...@gmail.com> > wrote:

Steven,

Just want to add a few things to what people wrote.

In base R, the methods mentioned will let you make a copy of your original DF 
that is missing the items you are selecting that match your pattern.

That is fine.

For some purposes, you want to keep the original data.frame and remove a column 
within it. You can do that in several ways but the simplest is something where 
you sat the column to NULL as in:

mydata$NAME <- NULL

using the mydata["NAME"] notation can do that for you by using a loop of 
unctional programming method that does that with all components of your grep.

R does have optimizations that make this less useful as a partial copy of a 
data.frame retains common parts till things change.

For those who like to use the tidyverse, it comes with lots of tools that let 
you select columns that start with or end with or contain some pattern and I 
find that way easier.



-Original Message-
From: R-help mailto:r-help-boun...@r-project.org> > On Behalf Of Steven Yen
Sent: Saturday, January 14, 2023 7:49 AM
To: Andrew Simmons mailto:akwsi...@gmail.com> >
Cc: R-help Mailing List mailto:r-help@r-project.org> >
Subject: Re: [R] Removing variables from data frame with a wile card

Thanks to all. Very helpful.

Steven from iPhone

> On Jan 14, 2023, at 3:08 PM, Andrew Simmons   > wrote:
> 
> You'll want to use grep() or grepl(). By default, grep() uses 
> extended regular expressions to find matches, but you can also use 
> perl regular expressions and globbing (after converting to a regular 
> expression).
> For example:
> 
> grepl("^yr", colnames(mydata))
> 
> will tell you which 'colnames' start with "yr". If you'd rather you 
> use globbing:
> 
> grepl(glob2rx("yr*"), colnames(mydata))
> 
> Then you might write something like this to remove the columns starting with 
> yr:
> 
> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
> 
>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen >  > wrote:
>> 
>> I have a data frame containing variables "yr3",...,"yr28".
>> 
>> How do I remove them with a wild cardsomething similar to "del yr*"
>> in Windows/doc? Thank you.
>> 
>>> colnames(mydata)
>>   [1] "year"   "weight" "confeduc"   "confothr" "college"
>>   [6] ...
>>  [41] "yr3""yr4""yr5""yr6" "yr7"
>>  [46] "yr8""yr9""yr10"   "yr11" "yr12"
>>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>>  [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>>  [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>>  [66] "yr28"...
>> 
>> __
>> R-help@r-project.org   mailing list -- To 
>> UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread avi.e.gross
Valentin,

You are correct that R does many things largely behind the scenes that make 
some operations fairly efficient.

>From a programming point of view, though, many people might make a data.frame 
>and not think of it as a list of vectors of the same length that are kept that 
>way.

So if they made a copy of the original data with fewer columns, they might be 
tempted to think the original item was completely copied and the original is 
either around or if the identifier was re-used, will be garbage collected. As 
you note, the only thinks collected are the columns you chose not to include.

For some it seems cleaner to set a list item to NULL, which seems to remove it 
immediately. 

The real point I hoped to make is that using base R, you can indeed approach 
removing (multiple) columns in two logical ways. One is to seemingly remove 
them in the original object, even if your point is valid. The other is to make 
a copy of just what you want and ignore the rest and it may be kept around or 
not.

If someone really wanted to get down to the basics, they could get a reference 
to all the columns they want to keep, as in col1 <- mydata[["col1"] ] and use 
those to make a new data.frame, or many other variants on these methods.  

Many programming languages have some qualms (I mean designers and programmers, 
and just plain purists) about when "pointers" of sorts are used and whether 
things should be mutable and so on so I prefer to avoid religious wars.

-Original Message-
From: Valentin Petzel  
Sent: Saturday, January 14, 2023 1:21 PM
To: avi.e.gr...@gmail.com
Cc: 'R-help Mailing List' 
Subject: Re: [R] Removing variables from data frame with a wile card

Hello Avi,

while something like d$something <- ... may seem like you're directly modifying 
the data it does not actually do so. Most R objects try to be immutable, that 
is, the object may not change after creation. This guarantees that if you have 
a binding for same object the object won't change sneakily.

There is a data structure that is in fact mutable which are environments. For 
example compare

L <- list()
local({L$a <- 3})
L$a

with

E <- new.env()
local({E$a <- 3})
E$a

The latter will in fact work, as the same Environment is modified, while in the 
first one a modified copy of the list is made.

Under the hood we have a parser trick: If R sees something like

f(a) <- ...

it will look for a function f<- and call

a <- f<-(a, ...)

(this also happens for example when you do names(x) <- ...)

So in fact in our case this is equivalent to creating a copy with removed 
columns and rebind the symbol in the current environment to the result.

The data.table package breaks with this convention and uses C based routines 
that allow changing of data without copying the object. Doing

d[, (cols_to_remove) := NULL]

will actually change the data.

Regards,
Valentin

14.01.2023 18:28:33 avi.e.gr...@gmail.com:

> Steven,
> 
> Just want to add a few things to what people wrote.
> 
> In base R, the methods mentioned will let you make a copy of your original DF 
> that is missing the items you are selecting that match your pattern.
> 
> That is fine.
> 
> For some purposes, you want to keep the original data.frame and remove a 
> column within it. You can do that in several ways but the simplest is 
> something where you sat the column to NULL as in:
> 
> mydata$NAME <- NULL
> 
> using the mydata["NAME"] notation can do that for you by using a loop of 
> unctional programming method that does that with all components of your grep.
> 
> R does have optimizations that make this less useful as a partial copy of a 
> data.frame retains common parts till things change.
> 
> For those who like to use the tidyverse, it comes with lots of tools that let 
> you select columns that start with or end with or contain some pattern and I 
> find that way easier.
> 
> 
> 
> -Original Message-
> From: R-help  On Behalf Of Steven Yen
> Sent: Saturday, January 14, 2023 7:49 AM
> To: Andrew Simmons 
> Cc: R-help Mailing List 
> Subject: Re: [R] Removing variables from data frame with a wile card
> 
> Thanks to all. Very helpful.
> 
> Steven from iPhone
> 
>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
>> 
>> You'll want to use grep() or grepl(). By default, grep() uses 
>> extended regular expressions to find matches, but you can also use 
>> perl regular expressions and globbing (after converting to a regular 
>> expression).
>> For example:
>> 
>> grepl("^yr", colnames(mydata))
>> 
>> will tell you which 'colnames' start with "yr". If you'd rather you 
>> use globbing:
>> 
>> grepl(glob2rx("yr*"), colnames(mydata))
>> 
>> Then you might write something like this to remove the columns starting with 
>> yr:
>> 
>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>> 
>>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
>>> 
>>> I have a data frame containing variables "yr3",...,"yr28".
>>> 
>>> How do I remove them with a 

Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread John Kane
You rang sir?

library(tidyverse)
xx = 1:10
yr1 = yr2 = yr3 = rnorm(10)
dat1 <- data.frame(xx , yr1, yr2, y3)

dat1  %>%  select(!starts_with("yr"))

or for something a bit more exotic as I have been trying to learn a bit
about the "data.table package

library(data.table)

xx = 1:10
yr1 = yr2 = yr3 = rnorm(10)

dat2 <- data.table(xx , yr1, yr2, yr3)

dat2[, !names(dat2) %like% "yr", with=FALSE ]



On Sat, 14 Jan 2023 at 12:28,  wrote:

> Steven,
>
> Just want to add a few things to what people wrote.
>
> In base R, the methods mentioned will let you make a copy of your original
> DF that is missing the items you are selecting that match your pattern.
>
> That is fine.
>
> For some purposes, you want to keep the original data.frame and remove a
> column within it. You can do that in several ways but the simplest is
> something where you sat the column to NULL as in:
>
> mydata$NAME <- NULL
>
> using the mydata["NAME"] notation can do that for you by using a loop of
> unctional programming method that does that with all components of your
> grep.
>
> R does have optimizations that make this less useful as a partial copy of
> a data.frame retains common parts till things change.
>
> For those who like to use the tidyverse, it comes with lots of tools that
> let you select columns that start with or end with or contain some pattern
> and I find that way easier.
>
>
>
> -Original Message-
> From: R-help  On Behalf Of Steven Yen
> Sent: Saturday, January 14, 2023 7:49 AM
> To: Andrew Simmons 
> Cc: R-help Mailing List 
> Subject: Re: [R] Removing variables from data frame with a wile card
>
> Thanks to all. Very helpful.
>
> Steven from iPhone
>
> > On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
> >
> > You'll want to use grep() or grepl(). By default, grep() uses
> > extended regular expressions to find matches, but you can also use
> > perl regular expressions and globbing (after converting to a regular
> expression).
> > For example:
> >
> > grepl("^yr", colnames(mydata))
> >
> > will tell you which 'colnames' start with "yr". If you'd rather you
> > use globbing:
> >
> > grepl(glob2rx("yr*"), colnames(mydata))
> >
> > Then you might write something like this to remove the columns starting
> with yr:
> >
> > mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
> >
> >> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
> >>
> >> I have a data frame containing variables "yr3",...,"yr28".
> >>
> >> How do I remove them with a wild cardsomething similar to "del yr*"
> >> in Windows/doc? Thank you.
> >>
> >>> colnames(mydata)
> >>   [1] "year"   "weight" "confeduc"   "confothr" "college"
> >>   [6] ...
> >>  [41] "yr3""yr4""yr5""yr6" "yr7"
> >>  [46] "yr8""yr9""yr10"   "yr11" "yr12"
> >>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
> >>  [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
> >>  [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
> >>  [66] "yr28"...
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
John Kane
Kingston ON Canada

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread avi.e.gross
Steven,

Just want to add a few things to what people wrote.

In base R, the methods mentioned will let you make a copy of your original DF 
that is missing the items you are selecting that match your pattern.

That is fine.

For some purposes, you want to keep the original data.frame and remove a column 
within it. You can do that in several ways but the simplest is something where 
you sat the column to NULL as in:

mydata$NAME <- NULL

using the mydata["NAME"] notation can do that for you by using a loop of 
unctional programming method that does that with all components of your grep.

R does have optimizations that make this less useful as a partial copy of a 
data.frame retains common parts till things change.

For those who like to use the tidyverse, it comes with lots of tools that let 
you select columns that start with or end with or contain some pattern and I 
find that way easier.



-Original Message-
From: R-help  On Behalf Of Steven Yen
Sent: Saturday, January 14, 2023 7:49 AM
To: Andrew Simmons 
Cc: R-help Mailing List 
Subject: Re: [R] Removing variables from data frame with a wile card

Thanks to all. Very helpful.

Steven from iPhone

> On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
> 
> You'll want to use grep() or grepl(). By default, grep() uses 
> extended regular expressions to find matches, but you can also use 
> perl regular expressions and globbing (after converting to a regular 
> expression).
> For example:
> 
> grepl("^yr", colnames(mydata))
> 
> will tell you which 'colnames' start with "yr". If you'd rather you 
> use globbing:
> 
> grepl(glob2rx("yr*"), colnames(mydata))
> 
> Then you might write something like this to remove the columns starting with 
> yr:
> 
> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
> 
>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
>> 
>> I have a data frame containing variables "yr3",...,"yr28".
>> 
>> How do I remove them with a wild cardsomething similar to "del yr*"
>> in Windows/doc? Thank you.
>> 
>>> colnames(mydata)
>>   [1] "year"   "weight" "confeduc"   "confothr" "college"
>>   [6] ...
>>  [41] "yr3""yr4""yr5""yr6" "yr7"
>>  [46] "yr8""yr9""yr10"   "yr11" "yr12"
>>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>>  [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>>  [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>>  [66] "yr28"...
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread Bill Dunlap
The -grep(pattern,colnames) as a subscript is a bit dangerous.  If no
colname matches the pattern then all columns will be omitted (because -0 is
the same as 0, which means no column). !grepl(pattern,colnames) avoids this
problem.

> mydata <- data.frame(A=1:3,B=11:13)
> mydata[, -grep("^yr", colnames(mydata))]
data frame with 0 columns and 3 rows
> mydata[, !grepl("^yr", colnames(mydata))]
  A  B
1 1 11
2 2 12
3 3 13

-Bill

On Fri, Jan 13, 2023 at 11:07 PM Eric Berger  wrote:

> mydata[, -grep("^yr",colnames(mydata))]
>
> On Sat, Jan 14, 2023 at 8:57 AM Steven T. Yen  wrote:
>
> > I have a data frame containing variables "yr3",...,"yr28".
> >
> > How do I remove them with a wild cardsomething similar to "del yr*"
> > in Windows/doc? Thank you.
> >
> >  > colnames(mydata)
> >[1] "year"   "weight" "confeduc"   "confothr" "college"
> >[6] ...
> >   [41] "yr3""yr4""yr5""yr6" "yr7"
> >   [46] "yr8""yr9""yr10"   "yr11" "yr12"
> >   [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
> >   [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
> >   [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
> >   [66] "yr28"...
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread Steven Yen
Thanks to all. Very helpful.

Steven from iPhone

> On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
> 
> You'll want to use grep() or grepl(). By default, grep() uses extended
> regular expressions to find matches, but you can also use perl regular
> expressions and globbing (after converting to a regular expression).
> For example:
> 
> grepl("^yr", colnames(mydata))
> 
> will tell you which 'colnames' start with "yr". If you'd rather you
> use globbing:
> 
> grepl(glob2rx("yr*"), colnames(mydata))
> 
> Then you might write something like this to remove the columns starting with 
> yr:
> 
> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
> 
>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
>> 
>> I have a data frame containing variables "yr3",...,"yr28".
>> 
>> How do I remove them with a wild cardsomething similar to "del yr*"
>> in Windows/doc? Thank you.
>> 
>>> colnames(mydata)
>>   [1] "year"   "weight" "confeduc"   "confothr" "college"
>>   [6] ...
>>  [41] "yr3""yr4""yr5""yr6" "yr7"
>>  [46] "yr8""yr9""yr10"   "yr11" "yr12"
>>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>>  [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>>  [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>>  [66] "yr28"...
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.