Re: [R] how to extract strings in any column and in any row that start with
Hi Rui, thank you so much that is exactly what I needed! Cheers, Ana On Fri, May 15, 2020 at 5:12 PM Rui Barradas wrote: > > Hello, > > I have tried several options and with large dataframes this one was the > fastest (in my tests, of the ones I have tried). > > > s1 <- sapply(tot, function(x) grep('^E10', x, value = TRUE)) > > > Then unlist(s1). > A close second (15% slower) was > > > s2 <- tot[sapply(tot, function(x) grepl('^E10', x))] > > > grep/unlist was 3.7 times slower: > > > grep("^E10", unlist(tot), value = TRUE) > > > Hope this helps, > > Rui Barradas > > Às 20:24 de 15/05/20, Ana Marija escreveu: > > Hello, > > > > this command was running for more than 2 hours > > grep("E10",tot,value=T) > > and no output > > > > and this command > > df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) > > > > gave me a subset (a data frame) of tot where ^E10 > > > > what I need is just a vector or all values in tot which start with E10. > > > > Thanks > > Ana > > > > On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller > > wrote: > >> > >> Read about regular expressions... they are extremely useful. > >> > >> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) > >> > >> It is bad form not to put spaces around the <- assignment. > >> > >> > >> On May 15, 2020 10:00:04 AM PDT, Ana Marija > >> wrote: > >>> Hello, > >>> > >>> I have a data frame: > >>> > dim(tot) > >>> [1] 502536 1093 > >>> > >>> How would I extract from it all strings that start with E10? > >>> > >>> I know how to extract all rows that contain with E10 > >>> df0<-tot %>% filter_all(any_vars(. %in% c('E10'))) > dim(df0) > >>> [1] 5105 1093 > >>> > >>> but I just need a vector of strings that start with E10... > >>> it would look something like this: > >>> > >>> [1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107" > >>> > >>> Thanks > >>> Ana > >>> > >>> __ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >> -- > >> Sent from my phone. Please excuse my brevity. > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to extract strings in any column and in any row that start with
Hello, I have tried several options and with large dataframes this one was the fastest (in my tests, of the ones I have tried). s1 <- sapply(tot, function(x) grep('^E10', x, value = TRUE)) Then unlist(s1). A close second (15% slower) was s2 <- tot[sapply(tot, function(x) grepl('^E10', x))] grep/unlist was 3.7 times slower: grep("^E10", unlist(tot), value = TRUE) Hope this helps, Rui Barradas Às 20:24 de 15/05/20, Ana Marija escreveu: Hello, this command was running for more than 2 hours grep("E10",tot,value=T) and no output and this command df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) gave me a subset (a data frame) of tot where ^E10 what I need is just a vector or all values in tot which start with E10. Thanks Ana On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller wrote: Read about regular expressions... they are extremely useful. df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) It is bad form not to put spaces around the <- assignment. On May 15, 2020 10:00:04 AM PDT, Ana Marija wrote: Hello, I have a data frame: dim(tot) [1] 502536 1093 How would I extract from it all strings that start with E10? I know how to extract all rows that contain with E10 df0<-tot %>% filter_all(any_vars(. %in% c('E10'))) dim(df0) [1] 5105 1093 but I just need a vector of strings that start with E10... it would look something like this: [1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107" Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to extract strings in any column and in any row that start with
> How would I extract from it all strings that start with E10? Hi Ana, Here's a simple solution: x <- c ("P24601", "E101", "E102", "3.141593", "E101", "xE101", "e103", " E104 ") x [substring (x, 1, 3) == "E10"] You' will need to replace x with another *character vector*. (As touched on earlier, a data.frame may cause some problems). Here's some variations: unique (x [substring (x, 1, 3) == "E10"]) y <- toupper (x) y [substring (y, 1, 3) == "E10"] y <- trimws (x) y [substring (y, 1, 3) == "E10"] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to extract strings in any column and in any row that start with
This is almost certainly not the most efficient way: tot <- data.frame(v1 = paste0(LETTERS[seq(1:5)],seq(1:10)), v2 = paste0(LETTERS[seq(1:5)],seq(from = 101, to=110, by = 1)), v3 = paste0(LETTERS[seq(1:5)],seq(from = 111, to=120, by = 1)), v4 = paste0(LETTERS[seq(1:5)],seq(from = 121, to=130, by = 1)), v5 = paste0(LETTERS[seq(1:5)],seq(from = 131, to=140, by = 1)), v6 = paste0(LETTERS[seq(1:5)],seq(from = 101, to=110, by = 1)) ) # set a variable to hold the result myResult <- NULL # iterate through each variable for (v in 1:length(tot[1,])) { thisResult <- as.character(tot[grepl ('^E10', tot[,v]),v]) myResult <- c(myResult, thisResult) } myResult <- unique( myResult ) === Indeed as I wrote this Jeff has popped along with unlist! Using my example above: unique ( as.character( unlist (tot) )[grepl ('^E10', as.character( unlist (tot) ) )] ) does what you wanted (you may not need the as.characters if you are on R 4.o, or if your df has chars rather than factors. On 2020-05-15 21:34, Jeff Newmiller wrote: If you want to treat your data frame as if it were a vector, then convert it to a vector before you give it to grep. unlist(tot) On May 15, 2020 12:24:17 PM PDT, Ana Marija wrote: Hello, this command was running for more than 2 hours grep("E10",tot,value=T) and no output and this command df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) gave me a subset (a data frame) of tot where ^E10 what I need is just a vector or all values in tot which start with E10. Thanks Ana On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller wrote: Read about regular expressions... they are extremely useful. df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) It is bad form not to put spaces around the <- assignment. On May 15, 2020 10:00:04 AM PDT, Ana Marija wrote: >Hello, > >I have a data frame: > >> dim(tot) >[1] 502536 1093 > >How would I extract from it all strings that start with E10? > >I know how to extract all rows that contain with E10 >df0<-tot %>% filter_all(any_vars(. %in% c('E10'))) >> dim(df0) >[1] 5105 1093 > >but I just need a vector of strings that start with E10... >it would look something like this: > >[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107" > >Thanks >Ana > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to extract strings in any column and in any row that start with
If you want to treat your data frame as if it were a vector, then convert it to a vector before you give it to grep. unlist(tot) On May 15, 2020 12:24:17 PM PDT, Ana Marija wrote: >Hello, > >this command was running for more than 2 hours >grep("E10",tot,value=T) >and no output > >and this command >df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) > >gave me a subset (a data frame) of tot where ^E10 > >what I need is just a vector or all values in tot which start with E10. > >Thanks >Ana > >On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller > wrote: >> >> Read about regular expressions... they are extremely useful. >> >> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) >> >> It is bad form not to put spaces around the <- assignment. >> >> >> On May 15, 2020 10:00:04 AM PDT, Ana Marija > wrote: >> >Hello, >> > >> >I have a data frame: >> > >> >> dim(tot) >> >[1] 502536 1093 >> > >> >How would I extract from it all strings that start with E10? >> > >> >I know how to extract all rows that contain with E10 >> >df0<-tot %>% filter_all(any_vars(. %in% c('E10'))) >> >> dim(df0) >> >[1] 5105 1093 >> > >> >but I just need a vector of strings that start with E10... >> >it would look something like this: >> > >> >[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107" >> > >> >Thanks >> >Ana >> > >> >__ >> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >https://stat.ethz.ch/mailman/listinfo/r-help >> >PLEASE do read the posting guide >> >http://www.R-project.org/posting-guide.html >> >and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Sent from my phone. Please excuse my brevity. -- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to extract strings in any column and in any row that start with
Hello, this command was running for more than 2 hours grep("E10",tot,value=T) and no output and this command df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) gave me a subset (a data frame) of tot where ^E10 what I need is just a vector or all values in tot which start with E10. Thanks Ana On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller wrote: > > Read about regular expressions... they are extremely useful. > > df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) > > It is bad form not to put spaces around the <- assignment. > > > On May 15, 2020 10:00:04 AM PDT, Ana Marija > wrote: > >Hello, > > > >I have a data frame: > > > >> dim(tot) > >[1] 502536 1093 > > > >How would I extract from it all strings that start with E10? > > > >I know how to extract all rows that contain with E10 > >df0<-tot %>% filter_all(any_vars(. %in% c('E10'))) > >> dim(df0) > >[1] 5105 1093 > > > >but I just need a vector of strings that start with E10... > >it would look something like this: > > > >[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107" > > > >Thanks > >Ana > > > >__ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to extract strings in any column and in any row that start with
Read about regular expressions... they are extremely useful. df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) It is bad form not to put spaces around the <- assignment. On May 15, 2020 10:00:04 AM PDT, Ana Marija wrote: >Hello, > >I have a data frame: > >> dim(tot) >[1] 502536 1093 > >How would I extract from it all strings that start with E10? > >I know how to extract all rows that contain with E10 >df0<-tot %>% filter_all(any_vars(. %in% c('E10'))) >> dim(df0) >[1] 5105 1093 > >but I just need a vector of strings that start with E10... >it would look something like this: > >[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107" > >Thanks >Ana > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to extract strings in any column and in any row that start with
Hello, I have a data frame: > dim(tot) [1] 502536 1093 How would I extract from it all strings that start with E10? I know how to extract all rows that contain with E10 df0<-tot %>% filter_all(any_vars(. %in% c('E10'))) > dim(df0) [1] 5105 1093 but I just need a vector of strings that start with E10... it would look something like this: [1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107" Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.