[R] grep won't work finding one column
I'm having an issue with grep: I have numerous columns that end with .at... when I use grep like so: df[,grep(.at,colnames(df))] it works fine. When I have one column that ends with .at, it does not work. Why is that? As this is loop with varying number of columns ending in .at I would like some code that would work with 1 to n number of columns. Is there something more optimal than grep? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep won't work finding one column
On Tue, Oct 14, 2014 at 9:23 AM, Kate Ignatius kate.ignat...@gmail.com wrote: I'm having an issue with grep: I have numerous columns that end with .at... when I use grep like so: df[,grep(.at,colnames(df))] it works fine. When I have one column that ends with .at, it does not work. Why is that? As this is loop with varying number of columns ending in .at I would like some code that would work with 1 to n number of columns. Is there something more optimal than grep? Thanks! I can't answer your direct question. But do you realize that your code does not match your words? The grep show does not _only_ match columns who name end with the characters '.at'. It matches all column names which contain any character followed by the characters at. To do the match with only columns whose names end with the characters .at, you need: grep(\.at$,colnames(df)). You might want to post an example which fails. Just to be complete, be sure to use the dput() function so that it is easy for members of the group to cut'n'paste to get your data into our own R workspace. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep won't work finding one column
For example, DF will usually have numerous columns with sample1.at sample1.dp sample1.fg sample2.at sample2.dp sample2.fg and so on I'm running this code in R as part of a shell script which runs over several different file sizes so sometimes it will come across a file with one sample in it: i.e. sample1: when the R code runs through this file... trying to grep out the sample1.at column does not work and it will halt and stop. Here is some sample data... say I want to get out the AT_ only column Sample_1 AT_1 A/A RR G/G AA T/T AA G/A RA G/G RR C/C AA C/C AA C/T RA A/A AA T/G RA it will have a problem grepping out this single column. On Tue, Oct 14, 2014 at 10:38 AM, John McKown john.archie.mck...@gmail.com wrote: On Tue, Oct 14, 2014 at 9:23 AM, Kate Ignatius kate.ignat...@gmail.com wrote: I'm having an issue with grep: I have numerous columns that end with .at... when I use grep like so: df[,grep(.at,colnames(df))] it works fine. When I have one column that ends with .at, it does not work. Why is that? As this is loop with varying number of columns ending in .at I would like some code that would work with 1 to n number of columns. Is there something more optimal than grep? Thanks! I can't answer your direct question. But do you realize that your code does not match your words? The grep show does not _only_ match columns who name end with the characters '.at'. It matches all column names which contain any character followed by the characters at. To do the match with only columns whose names end with the characters .at, you need: grep(\.at$,colnames(df)). You might want to post an example which fails. Just to be complete, be sure to use the dput() function so that it is easy for members of the group to cut'n'paste to get your data into our own R workspace. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep won't work finding one column
Your question is missing a reproducible example, and you don't say how it does not work, so we cannot tell what is going on. Two things do come to mind, though. A) Data frame subsets with only one column by default return a vector, which is a different type of object than a single-column data frame. You would need to read ?[.data.frame about the drop argument if you wanted to consistently get a data frame from this expression. B) The period is a wildcard in regular expressions. If you expect to limit your search to literal .at at the end of the name then you should use the search pattern \\.at$ instead (the first slash allows the second one to be stored by R in the string, and the second one is the only one seen by grep, which it reads as making the period not act like a wildcard). You really should read about regular expressions before using them. There are many tutorials on the web about this topic. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On October 14, 2014 7:23:55 AM PDT, Kate Ignatius kate.ignat...@gmail.com wrote: I'm having an issue with grep: I have numerous columns that end with .at... when I use grep like so: df[,grep(.at,colnames(df))] it works fine. When I have one column that ends with .at, it does not work. Why is that? As this is loop with varying number of columns ending in .at I would like some code that would work with 1 to n number of columns. Is there something more optimal than grep? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep won't work finding one column
Shouldn't it be grep(\\.at$,colnames(df)) with double back slash? Ivan -- Ivan Calandra University of Reims Champagne-Ardenne GEGENA² - EA 3795 CREA - 2 esplanade Roland Garros 51100 Reims, France +33(0)3 26 77 36 89 ivan.calan...@univ-reims.fr https://www.researchgate.net/profile/Ivan_Calandra Le 14/10/14 16:38, John McKown a écrit : On Tue, Oct 14, 2014 at 9:23 AM, Kate Ignatius kate.ignat...@gmail.com wrote: I'm having an issue with grep: I have numerous columns that end with .at... when I use grep like so: df[,grep(.at,colnames(df))] it works fine. When I have one column that ends with .at, it does not work. Why is that? As this is loop with varying number of columns ending in .at I would like some code that would work with 1 to n number of columns. Is there something more optimal than grep? Thanks! I can't answer your direct question. But do you realize that your code does not match your words? The grep show does not _only_ match columns who name end with the characters '.at'. It matches all column names which contain any character followed by the characters at. To do the match with only columns whose names end with the characters .at, you need: grep(\.at$,colnames(df)). You might want to post an example which fails. Just to be complete, be sure to use the dput() function so that it is easy for members of the group to cut'n'paste to get your data into our own R workspace. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep won't work finding one column
AT and at are not the same. If you want an case insensitive compare for the characters at you need the ignore.case=TRUE added. E.g.: df[,grep(.at,colnames(df),ignore.case=TRUE) That should match the column name you gave. Which does not match your initial description which said ending with .at. That has an embedded AT. So I am still a bit confused about your needs. On Tue, Oct 14, 2014 at 9:55 AM, Kate Ignatius kate.ignat...@gmail.com wrote: For example, DF will usually have numerous columns with sample1.at sample1.dp sample1.fg sample2.at sample2.dp sample2.fg and so on I'm running this code in R as part of a shell script which runs over several different file sizes so sometimes it will come across a file with one sample in it: i.e. sample1: when the R code runs through this file... trying to grep out the sample1.at column does not work and it will halt and stop. Here is some sample data... say I want to get out the AT_ only column Sample_1 AT_1 A/A RR G/G AA T/T AA G/A RA G/G RR C/C AA C/C AA C/T RA A/A AA T/G RA it will have a problem grepping out this single column. On Tue, Oct 14, 2014 at 10:38 AM, John McKown john.archie.mck...@gmail.com wrote: On Tue, Oct 14, 2014 at 9:23 AM, Kate Ignatius kate.ignat...@gmail.com wrote: I'm having an issue with grep: I have numerous columns that end with .at... when I use grep like so: df[,grep(.at,colnames(df))] it works fine. When I have one column that ends with .at, it does not work. Why is that? As this is loop with varying number of columns ending in .at I would like some code that would work with 1 to n number of columns. Is there something more optimal than grep? Thanks! I can't answer your direct question. But do you realize that your code does not match your words? The grep show does not _only_ match columns who name end with the characters '.at'. It matches all column names which contain any character followed by the characters at. To do the match with only columns whose names end with the characters .at, you need: grep(\.at$,colnames(df)). You might want to post an example which fails. Just to be complete, be sure to use the dput() function so that it is easy for members of the group to cut'n'paste to get your data into our own R workspace. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep won't work finding one column
You're right. I don't use regexps in R very much. In most other languages, a single \ is needed. The R parser is different and I forgot. Thanks for the heads up. On Tue, Oct 14, 2014 at 10:01 AM, Ivan Calandra ivan.calan...@univ-reims.fr wrote: Shouldn't it be grep(\\.at$,colnames(df)) with double back slash? Ivan -- Ivan Calandra University of Reims Champagne-Ardenne GEGENA² - EA 3795 CREA - 2 esplanade Roland Garros 51100 Reims, France +33(0)3 26 77 36 89 ivan.calan...@univ-reims.fr https://www.researchgate.net/profile/Ivan_Calandra Le 14/10/14 16:38, John McKown a écrit : On Tue, Oct 14, 2014 at 9:23 AM, Kate Ignatius kate.ignat...@gmail.com wrote: I'm having an issue with grep: I have numerous columns that end with .at... when I use grep like so: df[,grep(.at,colnames(df))] it works fine. When I have one column that ends with .at, it does not work. Why is that? As this is loop with varying number of columns ending in .at I would like some code that would work with 1 to n number of columns. Is there something more optimal than grep? Thanks! I can't answer your direct question. But do you realize that your code does not match your words? The grep show does not _only_ match columns who name end with the characters '.at'. It matches all column names which contain any character followed by the characters at. To do the match with only columns whose names end with the characters .at, you need: grep(\.at$,colnames(df)). You might want to post an example which fails. Just to be complete, be sure to use the dput() function so that it is easy for members of the group to cut'n'paste to get your data into our own R workspace. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep won't work finding one column
In the sense - it does not work. it works when there are 50 samples in the file, but it does not work when there is one. The usual headings are: sample1.at sample1.dp sample1.fg sample2.at sample2.dp sample2.fg and so on to a max of sample50.at sample50.dp sample50.fg using this greps out all the .at columns perfectly: df[,grep(.at,colnames(df))] When I come across a file when there is one sample: sample1.at sample1.dp sample1.fg Using this: df[,grep(.at,colnames(df))] returns nothing. Oh - AT/at was just an example... thats not my problem... On Tue, Oct 14, 2014 at 10:57 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: Your question is missing a reproducible example, and you don't say how it does not work, so we cannot tell what is going on. Two things do come to mind, though. A) Data frame subsets with only one column by default return a vector, which is a different type of object than a single-column data frame. You would need to read ?[.data.frame about the drop argument if you wanted to consistently get a data frame from this expression. B) The period is a wildcard in regular expressions. If you expect to limit your search to literal .at at the end of the name then you should use the search pattern \\.at$ instead (the first slash allows the second one to be stored by R in the string, and the second one is the only one seen by grep, which it reads as making the period not act like a wildcard). You really should read about regular expressions before using them. There are many tutorials on the web about this topic. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On October 14, 2014 7:23:55 AM PDT, Kate Ignatius kate.ignat...@gmail.com wrote: I'm having an issue with grep: I have numerous columns that end with .at... when I use grep like so: df[,grep(.at,colnames(df))] it works fine. When I have one column that ends with .at, it does not work. Why is that? As this is loop with varying number of columns ending in .at I would like some code that would work with 1 to n number of columns. Is there something more optimal than grep? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep won't work finding one column
On 15/10/14 04:09, Kate Ignatius wrote: In the sense - it does not work. it works when there are 50 samples in the file, but it does not work when there is one. The usual headings are: sample1.at sample1.dp sample1.fg sample2.at sample2.dp sample2.fg and so on to a max of sample50.at sample50.dp sample50.fg using this greps out all the .at columns perfectly: df[,grep(.at,colnames(df))] When I come across a file when there is one sample: sample1.at sample1.dp sample1.fg Using this: df[,grep(.at,colnames(df))] returns nothing. Oh - AT/at was just an example... thats not my problem... You are being (deliberately?) obtuse. It's *all* your problem. You have to be precise when working with computers and when providing examples. Don't build examples with confusing red herrings. Your assertion that df[,grep(.at,colnames(df))] returns nothing is simple ***INCORRECT***. It works just fine. See the (tidy, completely reproducible) example in the attached file kate.txt. Note that, with a single .at column in your data frame, what is returned is ***NOT*** a data frame but rather a vector. If you want a (one-column) data frame you need to use drop=FALSE in your subscripting call. You need to study up on R and learn how it works (read the Introduction to R) and stop going off half-cocked. cheers, Rolf Turner P.S. It is a ***bad*** idea to use df as the name of a data frame. The string df is the name of a *function* in base R (it is the probability density function for the F distribution). Although R is clever enough to distinguish functions from data objects in *most* circumstances, at the very least confusion could arise. R. T. -- Rolf Turner Technical Editor ANZJS # # Check it out. # # Data frame with one .at column. d1 - as.data.frame(matrix(1,ncol=3,nrow=10)) n1 - c(sample1.at,sample1.dp,sample1.g) names(d1) - n1 # Data frame with many .at columns. d2 - as.data.frame(matrix(1,ncol=50,nrow=10)) set.seed(42) n2 - paste(sample,1:50,sample(c(.at,.dp,.fg),50,TRUE),sep=) names(d2) - n2 # Extract the .at columns. print(d1[,grep(.at,colnames(d1))]) print(d2[,grep(.at,colnames(d2))]) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.