Re: [R] selecting columns from a data frame or data table by type, ie, numeric, integer
Thank you Bill Dunlap. So simple I never tried that approach. Tried dozens of others though, read manuals till I was getting headaches, and of course the answer was simple when one is competent. Learning, its a struggle, but slowly getting there. Thanks again Carl Sutton CPA On Friday, April 29, 2016 10:50 AM, William Dunlap wrote: > dt1[ vapply(dt1, FUN=is.numeric, FUN.VALUE=NA) ] a c1 1 1.12 2 > 1.0...10 10 0.2 Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Apr 29, 2016 at 9:19 AM, Carl Sutton via R-help wrote: Good morning RGuru's I have a data frame of 575 columns. I want to extract only those columns that are numeric(double) or integer to do some machine learning with. I have searched the web for a couple of days (off and on) and have not found anything that shows how to do this. Lots of ways to extract rows, but not columns. I have attempted to use "(x == y)" indices extraction method but that threw error that == was for atomic vectors and lists, and I was doing this on a data frame. My test code is below # a technique to get column classes library(data.table) a <- 1:10 b <- c("a","b","c","d","e","f","g","h","i","j") c <- seq(1.1, .2, length = 10) dt1 <- data.table(a,b,c) str(dt1) col.classes <- sapply(dt1, class) head(col.classes) dt2 <- subset(dt1, typeof = "double" | "numeric") str(dt2) dt2 # not subset dt2 <- dt1[, list(typeof = "double")] str(dt2) class_data <- dt1[,sapply(dt1,is.integer) | sapply(dt1, is.numeric)] class_data sum(class_data) typeof(class_data) names(class_data) str(class_data) Any help is appreciated Carl Sutton CPA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting columns from a data frame or data table by type, ie, numeric, integer
Hi, I was able to replicate the solution as suggested by William in case of data.frame class, not in case of data.table class. In case of data.table, I had to do some minor changes as shown below. library(data.table) a <- 1:10 b <- c("a","b","c","d","e","f","g","h","i","j") c <- seq(1.1, .2, length = 10) # in case of data frame dt1 <- data.frame(a,b,c) dt1[vapply(dt1, FUN=is.numeric, FUN.VALUE=NA)] a c 1 1 1.1 2 2 1.0 3 3 0.9 4 4 0.8 5 5 0.7 6 6 0.6 7 7 0.5 8 8 0.4 9 9 0.3 10 10 0.2 # in case of data table dt1 <- data.table(a,b,c) dt1[, vapply(dt1, FUN=is.numeric, FUN.VALUE=NA), with=FALSE] a c 1 1 1.1 2 2 1.0 3 3 0.9 4 4 0.8 5 5 0.7 6 6 0.6 7 7 0.5 8 8 0.4 9 9 0.3 10 10 0.2 -- Best, GG [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting columns from a data frame or data table by type, ie, numeric, integer
> dt1[ vapply(dt1, FUN=is.numeric, FUN.VALUE=NA) ] a c 1 1 1.1 2 2 1.0 ... 10 10 0.2 Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Apr 29, 2016 at 9:19 AM, Carl Sutton via R-help < r-help@r-project.org> wrote: > Good morning RGuru's > I have a data frame of 575 columns. I want to extract only those columns > that are numeric(double) or integer to do some machine learning with. I > have searched the web for a couple of days (off and on) and have not found > anything that shows how to do this. Lots of ways to extract rows, but not > columns. I have attempted to use "(x == y)" indices extraction method but > that threw error that == was for atomic vectors and lists, and I was doing > this on a data frame. > > My test code is below > > # a technique to get column classes > library(data.table) > a <- 1:10 > b <- c("a","b","c","d","e","f","g","h","i","j") > c <- seq(1.1, .2, length = 10) > dt1 <- data.table(a,b,c) > str(dt1) > col.classes <- sapply(dt1, class) > head(col.classes) > dt2 <- subset(dt1, typeof = "double" | "numeric") > str(dt2) > dt2 # not subset > dt2 <- dt1[, list(typeof = "double")] > str(dt2) > class_data <- dt1[,sapply(dt1,is.integer) | sapply(dt1, is.numeric)] > class_data > sum(class_data) > typeof(class_data) > names(class_data) > str(class_data) > Any help is appreciated > Carl Sutton CPA > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting columns based on partial names
Thank you. That worked. On 19-Jun-2014 3:24 PM, Uwe Ligges wrote: On 19.06.2014 23:50, Chris Dolanc wrote: Hello, I have a data frame with > 5000 columns and I'd like to be able to make subsets of that data frame made up of certain columns by using part of the column names. I've had a surprisingly hard time finding something that works by searching online. For example, lets say I have a data frame (df) of 2 obs. of 6 variables. The 6 variables are called "1940_tmax", "1940_ppt", "1940_tmin", "1941_tmax", "1941_ppt", "1941_tmin". I want to create a new data frame with only the variables that have "ppt" in the variable (column) name, so that it looks like this: plot name1940_ppt1941_ppt 774-CL231 344 778-RW 228 313 Thanks. df[ , grepl("_ppt$", names(df))] Best, Uwe Ligges -- Christopher R. Dolanc Post-doctoral Researcher University of California, Davis & University of Montana __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting columns based on partial names
On Thu, 19 Jun 2014 02:50:20 PM Chris Dolanc wrote: > Hello, > > I have a data frame with > 5000 columns and I'd like to be able to make > subsets of that data frame made up of certain columns by using part of > the column names. I've had a surprisingly hard time finding something > that works by searching online. > > For example, lets say I have a data frame (df) of 2 obs. of 6 variables. > The 6 variables are called "1940_tmax", "1940_ppt", "1940_tmin", > "1941_tmax", "1941_ppt", "1941_tmin". I want to create a new data frame > with only the variables that have "ppt" in the variable (column) name, > so that it looks like this: > > plot name1940_ppt1941_ppt > 774-CL231 344 > 778-RW 228 313 > Hi Chris, One way is to get the column indices: grep("ppt",names(df)) [1] 2 5 so, newdf<-df[grep("ppt",names(df))] and then you apparently want to add a column with some other information, so probably: newdf<-cbind(, df[grep("ppt",names(df))]) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting columns based on partial names
On 19.06.2014 23:50, Chris Dolanc wrote: Hello, I have a data frame with > 5000 columns and I'd like to be able to make subsets of that data frame made up of certain columns by using part of the column names. I've had a surprisingly hard time finding something that works by searching online. For example, lets say I have a data frame (df) of 2 obs. of 6 variables. The 6 variables are called "1940_tmax", "1940_ppt", "1940_tmin", "1941_tmax", "1941_ppt", "1941_tmin". I want to create a new data frame with only the variables that have "ppt" in the variable (column) name, so that it looks like this: plot name1940_ppt1941_ppt 774-CL231 344 778-RW 228 313 Thanks. df[ , grepl("_ppt$", names(df))] Best, Uwe Ligges __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
Hi Greg, This is very helpful. Thanks for explaining it. I'm clearly going to need to improve my understanding of regular expressions. Currently busy trying to figure out Sweave and knitr though. Paul --- On Thu, 4/26/12, Greg Snow <538...@gmail.com> wrote: > From: Greg Snow <538...@gmail.com> > Subject: Re: [R] Selecting columns whose names contain "mutated" except when > they also contain "non" or "un" > To: "Paul Miller" > Cc: r-help@r-project.org > Received: Thursday, April 26, 2012, 1:55 PM > Sorry I took so long getting back to > this, but the paying job needs to > take priority. > > The regular expression "(? looks for a string that > matches "muta" then looks at the characters immediately > before it to > see if they match either "un" or "non" in which case it > makes it a not > match. More specifically the regular expression engine > steps through > the string and at each point tries the match, so at a given > point it > will first see if "un" is before that point, if it is then > this point > can't match and it moves the checking point, if it is not > "un" then it > moves to the next negative look behind and sees if "non" is > just > before the point. If neither "un" or "non" are just > before the point > then it starts matching characters after the point to see if > they > match "muta". > > So the next pattern is "(?!muta)non|un", the (?!muta) is a > negative > look ahead which starts at the point and checks forward to > see that > the next characters are not "muta" (but does not include > them in the > match), in this case it is a no-op because you are saying > that you > want to match at a point where the next characters are not > "muta" but > are "non" and since the next set of characters cannot > be both this is > the same as just matching "non", also you need to be aware > of the > operator precedence, in that pattern the (?!muta) part only > applied to > the "non", not the "un". > > To match "nonmuta" or "unmuta" a simple pattern would just > be > "(non|un)muta" or "(no|u)nmuta". You could use the > positive > lookbehind (you would still need an "or"), but it would be > overkill > for a grep command. The difference in the positive > look ahead/behind > is more important for replacing where the look ahead/behind > is needed > for the match to happen, but is not captured as part of the > match to > be replaced. > > > > On Tue, Apr 24, 2012 at 7:40 AM, Paul Miller > wrote: > > Hi Greg, > > > > This is quite helpful. Not so good yet with regular > expressions in general or Perl-like regular expressions. > Found the help page though, and think I was able to > determine how the code works as well as how I would select > only instances where "muta" is preceeded by either "non" or > "un". > > > >> (tmp <- > c('mutation','nonmutated','unmutated','verymutated','other')) > > [1] "mutation" "nonmutated" "unmutated" > "verymutated" "other" > > > >> grep("(? > [1] 1 4 > > > >> grep("(?!muta)non|un", tmp, perl=TRUE) > > [1] 2 3 > > > > Did I get the second grep right? > > > > If so, do you have any sense of why it seems to fail > when I apply it to my data? > > > >> KRASyn$NonMutant_comb <- > rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), > perl=TRUE)]) > > > > Error in rowSums(KRASyn[grep("(?!muta)non|un", > names(KRASyn), perl = TRUE)]) : > > 'x' must be numeric > > > > Thanks, > > > > Paul > > > > > > -- > Gregory (Greg) L. Snow Ph.D. > 538...@gmail.com > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
> David Winsemius > on Mon, 23 Apr 2012 12:16:39 -0400 writes: > On Apr 23, 2012, at 12:10 PM, Paul Miller wrote: >> Hello All, >> >> Started out awhile ago trying to select columns in a >> dataframe whose names contain some variation of the word >> "mutant" using code like: >> >> names(KRASyn)[grep("muta", names(KRASyn))] >> >> The idea then would be to add together the various >> columns using code like: >> >> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", >> names(KRASyn))]) >> >> What I discovered though, is that this selects columns >> like "nonmutated" and "unmutated" as well as columns like >> "mutated", "mutation", and "mutational". >> >> So I'd like to know how to select columns that have some >> variation of the word "mutant" without the "non" or the >> "un". I've been looking around for an example of how to >> do that but haven't found anything yet. >> >> Can anyone show me how to select the columns I need? > If you want only columns whose names _begin_ with "muta" > then add the "^" character at the beginning of your > pattern: > names(KRASyn)[grep("^muta", names(KRASyn))] > (This should be explained on the ?regex page.) It *is* !Search for "beginning" and you're there. Martin > David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
Sorry I took so long getting back to this, but the paying job needs to take priority. The regular expression "(? wrote: > Hi Greg, > > This is quite helpful. Not so good yet with regular expressions in general or > Perl-like regular expressions. Found the help page though, and think I was > able to determine how the code works as well as how I would select only > instances where "muta" is preceeded by either "non" or "un". > >> (tmp <- c('mutation','nonmutated','unmutated','verymutated','other')) > [1] "mutation" "nonmutated" "unmutated" "verymutated" "other" > >> grep("(? [1] 1 4 > >> grep("(?!muta)non|un", tmp, perl=TRUE) > [1] 2 3 > > Did I get the second grep right? > > If so, do you have any sense of why it seems to fail when I apply it to my > data? > >> KRASyn$NonMutant_comb <- rowSums(KRASyn[grep("(?!muta)non|un", >> names(KRASyn), perl=TRUE)]) > > Error in rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), perl = TRUE)]) : > 'x' must be numeric > > Thanks, > > Paul > -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
On Apr 24, 2012, at 19:15 , Rui Barradas wrote: > > Has anyone realized that both 'non' and 'un' end with the same letter? The > only one we really need to check? > > (tmp <- c('mutation','nonmutated','unmutated','verymutated','other')) > > i1 <- grepl("muta", tmp) > i2 <- grepl("nmuta", tmp) > > tmp[i1 & !i2] > Yes, I was wondering why people were avoiding the obvious use of grepl(). I'm not too happy about the "nmuta" technique though: What about "deletionmutation" and such? Might as well do the safe(r) thing: i2 <- grepl("unmuta", tmp) | grepl("nonmuta", tmp) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
Hello Dr. Winsemius, There was a non-numeric column. Thanks for helping me to see the obvious. Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
Hello, Greg Snow wrote > > Here is a method that uses negative look behind: > >> tmp <- c('mutation','nonmutated','unmutated','verymutated','other') >> grep("(? [1] 1 4 > > it looks for muta that is not immediatly preceeded by un or non (but > it would match "unusually mutated" since the un is not > immediatly > befor the muta). > > Hope this helps, > > On Mon, Apr 23, 2012 at 10:10 AM, Paul Millerwrote: >> Hello All, >> >> Started out awhile ago trying to select columns in a dataframe whose >> names contain some variation of the word "mutant" using code like: >> >> names(KRASyn)[grep("muta", names(KRASyn))] >> >> The idea then would be to add together the various columns using code >> like: >> >> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))]) >> >> What I discovered though, is that this selects columns like "nonmutated" >> and "unmutated" as well as columns like "mutated", "mutation", and >> "mutational". >> >> So I'd like to know how to select columns that have some variation of the >> word "mutant" without the "non" or the "un". I've been looking around for >> an example of how to do that but haven't found anything yet. >> >> Can anyone show me how to select the columns I need? >> >> Thanks, >> >> Paul >> >> __ >> R-help@ mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Gregory (Greg) L. Snow Ph.D. > 538280@ > > __ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Has anyone realized that both 'non' and 'un' end with the same letter? The only one we really need to check? (tmp <- c('mutation','nonmutated','unmutated','verymutated','other')) i1 <- grepl("muta", tmp) i2 <- grepl("nmuta", tmp) tmp[i1 & !i2] Now, not an answer to Greg's post, just convoluted. (tmp <- c(tmp, 'permutation', 'commutation')) cols <- list() cols[[1]] <- grep("muta", tmp) cols[[2]] <- grep("nmuta", tmp) cols[[3]] <- grep("(per)|(com)muta", tmp) Reduce(setdiff, cols) Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Selecting-columns-whose-names-contain-mutated-except-when-they-also-contain-non-or-un-tp4580914p4584219.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
On Apr 24, 2012, at 9:40 AM, Paul Miller wrote: Hi Greg, This is quite helpful. Not so good yet with regular expressions in general or Perl-like regular expressions. Found the help page though, and think I was able to determine how the code works as well as how I would select only instances where "muta" is preceeded by either "non" or "un". (tmp <- c('mutation','nonmutated','unmutated','verymutated','other')) [1] "mutation""nonmutated" "unmutated" "verymutated" "other" grep("(? [1] 1 4 grep("(?!muta)non|un", tmp, perl=TRUE) [1] 2 3 Did I get the second grep right? If so, do you have any sense of why it seems to fail when I apply it to my data? KRASyn$NonMutant_comb <- rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), perl=TRUE)]) Error in rowSums() : 'x' must be numeric The error message strongly suggests at least one non-numeric column. What does this return: lapply( KRASyn[grep("(?!muta)non|un", names(KRASyn), perl=TRUE)], is.numeric) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
Hi Greg, This is quite helpful. Not so good yet with regular expressions in general or Perl-like regular expressions. Found the help page though, and think I was able to determine how the code works as well as how I would select only instances where "muta" is preceeded by either "non" or "un". > (tmp <- c('mutation','nonmutated','unmutated','verymutated','other')) [1] "mutation""nonmutated" "unmutated" "verymutated" "other" > grep("(? grep("(?!muta)non|un", tmp, perl=TRUE) [1] 2 3 Did I get the second grep right? If so, do you have any sense of why it seems to fail when I apply it to my data? > KRASyn$NonMutant_comb <- rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), > perl=TRUE)]) Error in rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), perl = TRUE)]) : 'x' must be numeric Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
Here is a method that uses negative look behind: > tmp <- c('mutation','nonmutated','unmutated','verymutated','other') > grep("(? wrote: > Hello All, > > Started out awhile ago trying to select columns in a dataframe whose names > contain some variation of the word "mutant" using code like: > > names(KRASyn)[grep("muta", names(KRASyn))] > > The idea then would be to add together the various columns using code like: > > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))]) > > What I discovered though, is that this selects columns like "nonmutated" and > "unmutated" as well as columns like "mutated", "mutation", and "mutational". > > So I'd like to know how to select columns that have some variation of the > word "mutant" without the "non" or the "un". I've been looking around for an > example of how to do that but haven't found anything yet. > > Can anyone show me how to select the columns I need? > > Thanks, > > Paul > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
Hi Bert, Yes, code like: x <- names(yourdataframe) grepl("muta",x) & !grepl("nonmuta|unmuta",x) works perfectly. Thanks very much for your help. Paul --- On Mon, 4/23/12, Bert Gunter wrote: > From: Bert Gunter > Subject: Re: [R] Selecting columns whose names contain "mutated" except when > they also contain "non" or "un" > To: "Paul Miller" > Cc: "David Winsemius" , r-help@r-project.org > Received: Monday, April 23, 2012, 12:15 PM > But maybe ... (see below) > -- Bert > > On Mon, Apr 23, 2012 at 9:25 AM, Paul Miller > wrote: > > Hello Dr. Winsemius, > > > > Unfortunately, I also have terms like "krasmutated". So > simply selecting words that start with "muta" won't work in > this case. > > > > Thanks, > > > > Paul > > > > > > --- On Mon, 4/23/12, David Winsemius > wrote: > > > >> From: David Winsemius > >> Subject: Re: [R] Selecting columns whose names > contain "mutated" except when they also contain "non" or > "un" > >> To: "Paul Miller" > >> Cc: r-help@r-project.org > >> Received: Monday, April 23, 2012, 11:16 AM > >> > >> On Apr 23, 2012, at 12:10 PM, Paul Miller wrote: > >> > >> > Hello All, > >> > > >> > Started out awhile ago trying to select > columns in a > >> dataframe whose names contain some variation of the > word > >> "mutant" using code like: > >> > > >> > names(KRASyn)[grep("muta", names(KRASyn))] > >> > > >> > The idea then would be to add together the > various > >> columns using code like: > >> > > >> > KRASyn$Mutant_comb <- > rowSums(KRASyn[grep("muta", > >> names(KRASyn))]) > >> > > >> > What I discovered though, is that this selects > columns > >> like "nonmutated" and "unmutated" as well as > columns like > >> "mutated", "mutation", and "mutational". > >> > > >> > So I'd like to know how to select columns that > have > >> some variation of the word "mutant" without the > "non" or the > >> "un". I've been looking around for an example of > how to do > >> that but haven't found anything yet. > > If this **is** a complete specification then wouldn't > simply: > > x <- names(yourdataframe) > grepl("muta",x) & !grepl("nonmuta|unmuta",x) > > do it? > > e.g. > > x <- > c("nonmutated","unmutated","mutation","mutated","krasmutated") > > grepl("muta",x) & !grepl("nonmuta|unmuta",x) > [1] FALSE FALSE TRUE TRUE TRUE > > >> > > >> > Can anyone show me how to select the columns I > need? > >> > >> If you want only columns whose names _begin_ with > "muta" > >> then add the "^" character at the beginning of > your > >> pattern: > >> > >> names(KRASyn)[grep("^muta", names(KRASyn))] > >> > >> (This should be explained on the ?regex page.) > >> > >> -- > >> David Winsemius, MD > >> West Hartford, CT > >> > >> > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
But maybe ... (see below) -- Bert On Mon, Apr 23, 2012 at 9:25 AM, Paul Miller wrote: > Hello Dr. Winsemius, > > Unfortunately, I also have terms like "krasmutated". So simply selecting > words that start with "muta" won't work in this case. > > Thanks, > > Paul > > > --- On Mon, 4/23/12, David Winsemius wrote: > >> From: David Winsemius >> Subject: Re: [R] Selecting columns whose names contain "mutated" except when >> they also contain "non" or "un" >> To: "Paul Miller" >> Cc: r-help@r-project.org >> Received: Monday, April 23, 2012, 11:16 AM >> >> On Apr 23, 2012, at 12:10 PM, Paul Miller wrote: >> >> > Hello All, >> > >> > Started out awhile ago trying to select columns in a >> dataframe whose names contain some variation of the word >> "mutant" using code like: >> > >> > names(KRASyn)[grep("muta", names(KRASyn))] >> > >> > The idea then would be to add together the various >> columns using code like: >> > >> > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", >> names(KRASyn))]) >> > >> > What I discovered though, is that this selects columns >> like "nonmutated" and "unmutated" as well as columns like >> "mutated", "mutation", and "mutational". >> > >> > So I'd like to know how to select columns that have >> some variation of the word "mutant" without the "non" or the >> "un". I've been looking around for an example of how to do >> that but haven't found anything yet. If this **is** a complete specification then wouldn't simply: x <- names(yourdataframe) grepl("muta",x) & !grepl("nonmuta|unmuta",x) do it? e.g. > x <- c("nonmutated","unmutated","mutation","mutated","krasmutated") > grepl("muta",x) & !grepl("nonmuta|unmuta",x) [1] FALSE FALSE TRUE TRUE TRUE >> > >> > Can anyone show me how to select the columns I need? >> >> If you want only columns whose names _begin_ with "muta" >> then add the "^" character at the beginning of your >> pattern: >> >> names(KRASyn)[grep("^muta", names(KRASyn))] >> >> (This should be explained on the ?regex page.) >> >> -- >> David Winsemius, MD >> West Hartford, CT >> >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
On Apr 23, 2012, at 12:25 PM, Paul Miller wrote: Hello Dr. Winsemius, Unfortunately, I also have terms like "krasmutated". So simply selecting words that start with "muta" won't work in this case. You are aware that negative indexing can be used with grep aren't you? -- David. Thanks, Paul --- On Mon, 4/23/12, David Winsemius wrote: From: David Winsemius Subject: Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un" To: "Paul Miller" Cc: r-help@r-project.org Received: Monday, April 23, 2012, 11:16 AM On Apr 23, 2012, at 12:10 PM, Paul Miller wrote: Hello All, Started out awhile ago trying to select columns in a dataframe whose names contain some variation of the word "mutant" using code like: names(KRASyn)[grep("muta", names(KRASyn))] The idea then would be to add together the various columns using code like: KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))]) What I discovered though, is that this selects columns like "nonmutated" and "unmutated" as well as columns like "mutated", "mutation", and "mutational". So I'd like to know how to select columns that have some variation of the word "mutant" without the "non" or the "un". I've been looking around for an example of how to do that but haven't found anything yet. Can anyone show me how to select the columns I need? If you want only columns whose names _begin_ with "muta" then add the "^" character at the beginning of your pattern: names(KRASyn)[grep("^muta", names(KRASyn))] (This should be explained on the ?regex page.) -- David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
Below. -- Bert On Mon, Apr 23, 2012 at 9:10 AM, Paul Miller wrote: > Hello All, > > Started out awhile ago trying to select columns in a dataframe whose names > contain some variation of the word "mutant" using code like: > > names(KRASyn)[grep("muta", names(KRASyn))] > > The idea then would be to add together the various columns using code like: > > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))]) > > What I discovered though, is that this selects columns like "nonmutated" and > "unmutated" as well as columns like "mutated", "mutation", and "mutational". > > So I'd like to know how to select columns that have some variation of the > word "mutant" without the "non" or the "un". I've been looking around for an > example of how to do that but haven't found anything yet. You can't, because you have not provided a full specification of what can be selected and what can't. Software can only do what you tell it to -- it cannot read minds. Once you have provided a a complete and accurate specification of inclusion/exclusion criteria, it should be easy to write a regex procedure. "The fault, dear Brutus, lies not in the stars but in ourselves." -- Bert > > Can anyone show me how to select the columns I need? > > Thanks, > > Paul > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
Hello Dr. Winsemius, Unfortunately, I also have terms like "krasmutated". So simply selecting words that start with "muta" won't work in this case. Thanks, Paul --- On Mon, 4/23/12, David Winsemius wrote: > From: David Winsemius > Subject: Re: [R] Selecting columns whose names contain "mutated" except when > they also contain "non" or "un" > To: "Paul Miller" > Cc: r-help@r-project.org > Received: Monday, April 23, 2012, 11:16 AM > > On Apr 23, 2012, at 12:10 PM, Paul Miller wrote: > > > Hello All, > > > > Started out awhile ago trying to select columns in a > dataframe whose names contain some variation of the word > "mutant" using code like: > > > > names(KRASyn)[grep("muta", names(KRASyn))] > > > > The idea then would be to add together the various > columns using code like: > > > > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", > names(KRASyn))]) > > > > What I discovered though, is that this selects columns > like "nonmutated" and "unmutated" as well as columns like > "mutated", "mutation", and "mutational". > > > > So I'd like to know how to select columns that have > some variation of the word "mutant" without the "non" or the > "un". I've been looking around for an example of how to do > that but haven't found anything yet. > > > > Can anyone show me how to select the columns I need? > > If you want only columns whose names _begin_ with "muta" > then add the "^" character at the beginning of your > pattern: > > names(KRASyn)[grep("^muta", names(KRASyn))] > > (This should be explained on the ?regex page.) > > -- > David Winsemius, MD > West Hartford, CT > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
On Apr 23, 2012, at 12:10 PM, Paul Miller wrote: Hello All, Started out awhile ago trying to select columns in a dataframe whose names contain some variation of the word "mutant" using code like: names(KRASyn)[grep("muta", names(KRASyn))] The idea then would be to add together the various columns using code like: KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))]) What I discovered though, is that this selects columns like "nonmutated" and "unmutated" as well as columns like "mutated", "mutation", and "mutational". So I'd like to know how to select columns that have some variation of the word "mutant" without the "non" or the "un". I've been looking around for an example of how to do that but haven't found anything yet. Can anyone show me how to select the columns I need? If you want only columns whose names _begin_ with "muta" then add the "^" character at the beginning of your pattern: names(KRASyn)[grep("^muta", names(KRASyn))] (This should be explained on the ?regex page.) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting columns
Clayton - From your explanation, it sounds like you want to create a new file removing the "Location" variable, and all the variables that have the string "Ambient" or "Name" in their names. Suppose that your data frame is called mydata, and you wish to create a reduced csv file called "mydata.csv" write.csv(mydata[,grep('Location|Ambient|Name',names(mydata),invert=TRUE)], file='mydata.csv') should do what you want, but without a more concrete example, it's just a guess. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Tue, 15 Feb 2011, Clayton Dorrity wrote: I need help. I have very big .csv files with many unnecessary columns. From the original .csv files I would like to create a new .csv file with just the columns I need. For example: The original column heading are: Date, Time, Location, Sensor Name, Sensor Serial, Ambient Temp, IR Temp, Sensor Name.1, Sensor Serial.1, Ambient Temp.1, IR Temp.1, Sensor Name.2, Sensor Serial.2, Ambient Temp.2,..Sensor Name.45 I would like to create a new .csv file with only Date, Time, Sensor Serial, IR Temp, Sensor Serial.1, IR Temp.1, Sensor Serial.2, IR Temp.2,.Sensor Serial.45, IR Temp.45, etc Any help on this matter would be greatly appreciated. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting columns based on values of two variables
probably you're looking for subset(capdist, ida %in% c("DEN","SWD","FIN") & idb %in% c("DEN","SWD","FIN")) I hope it helps. Best, Dimitris Thomas Jensen wrote: Dear R-list, I am having troubles selecting rows from a very large data-set containing distances between capitals. The structure of the data-set looks like this: numaida numbidb kmdist midist 12 USA 20 CAN 731 456 22 USA 31 BHM 16231012 32 USA 40 CUB 18131130 I want to select a subset of these dyads, and have tried the following code: subset(capdist,ida == c("DEN","SWD","FIN") & idb == c("DEN","SWD","FIN")) This should ideally give me the dyads involving only Denmark, Sweden and Finland, however i get the error message: [1] numa idanumb idbkmdist midist <0 rows> (or 0-length row.names) Warning messages: 1: In is.na(e1) | is.na(e2) : longer object length is not a multiple of shorter object length 2: In `==.default`(ida, c("DEN", "SWD", "FIN")) : longer object length is not a multiple of shorter object length 3: In is.na(e1) | is.na(e2) : longer object length is not a multiple of shorter object length 4: In `==.default`(idb, c("DEN", "SWD", "FIN")) : longer object length is not a multiple of shorter object length Any help would be greatly appreciated, Best, Thomas Jensen [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.