Re: [R] how to extract strings in any column and in any row that start with

2020-05-15 Thread Ana Marija
Hi Rui,

thank you so much that is exactly what I needed!

Cheers,
Ana

On Fri, May 15, 2020 at 5:12 PM Rui Barradas  wrote:
>
> Hello,
>
> I have tried several options and with large dataframes this one was the
> fastest (in my tests, of the ones I have tried).
>
>
> s1 <- sapply(tot, function(x) grep('^E10', x, value = TRUE))
>
>
> Then unlist(s1).
> A close second (15% slower) was
>
>
> s2 <- tot[sapply(tot, function(x) grepl('^E10', x))]
>
>
> grep/unlist was 3.7 times slower:
>
>
> grep("^E10", unlist(tot), value = TRUE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 20:24 de 15/05/20, Ana Marija escreveu:
> > Hello,
> >
> > this command was running for more than 2 hours
> > grep("E10",tot,value=T)
> > and no output
> >
> > and this command
> > df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
> >
> > gave me a subset (a data frame) of tot where ^E10
> >
> > what I need is just a vector or all values in tot which start with E10.
> >
> > Thanks
> > Ana
> >
> > On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
> >  wrote:
> >>
> >> Read about regular expressions... they are extremely useful.
> >>
> >> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
> >>
> >> It is bad form not to put spaces around the <- assignment.
> >>
> >>
> >> On May 15, 2020 10:00:04 AM PDT, Ana Marija  
> >> wrote:
> >>> Hello,
> >>>
> >>> I have a data frame:
> >>>
>  dim(tot)
> >>> [1] 502536   1093
> >>>
> >>> How would I extract from it all strings that start with E10?
> >>>
> >>> I know how to extract all rows that contain with E10
> >>> df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
>  dim(df0)
> >>> [1] 5105 1093
> >>>
> >>> but I just need a vector of strings that start with E10...
> >>> it would look something like this:
> >>>
> >>> [1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
> >>>
> >>> Thanks
> >>> Ana
> >>>
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >> --
> >> Sent from my phone. Please excuse my brevity.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to extract strings in any column and in any row that start with

2020-05-15 Thread Rui Barradas

Hello,

I have tried several options and with large dataframes this one was the 
fastest (in my tests, of the ones I have tried).



s1 <- sapply(tot, function(x) grep('^E10', x, value = TRUE))


Then unlist(s1).
A close second (15% slower) was


s2 <- tot[sapply(tot, function(x) grepl('^E10', x))]


grep/unlist was 3.7 times slower:


grep("^E10", unlist(tot), value = TRUE)


Hope this helps,

Rui Barradas

Às 20:24 de 15/05/20, Ana Marija escreveu:

Hello,

this command was running for more than 2 hours
grep("E10",tot,value=T)
and no output

and this command
df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))

gave me a subset (a data frame) of tot where ^E10

what I need is just a vector or all values in tot which start with E10.

Thanks
Ana

On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
 wrote:


Read about regular expressions... they are extremely useful.

df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))

It is bad form not to put spaces around the <- assignment.


On May 15, 2020 10:00:04 AM PDT, Ana Marija  wrote:

Hello,

I have a data frame:


dim(tot)

[1] 502536   1093

How would I extract from it all strings that start with E10?

I know how to extract all rows that contain with E10
df0<-tot %>% filter_all(any_vars(. %in% c('E10')))

dim(df0)

[1] 5105 1093

but I just need a vector of strings that start with E10...
it would look something like this:

[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"

Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Sent from my phone. Please excuse my brevity.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to extract strings in any column and in any row that start with

2020-05-15 Thread Abby Spurdle
> How would I extract from it all strings that start with E10?

Hi Ana,

Here's a simple solution:

x <- c ("P24601", "E101", "E102", "3.141593",
"E101", "xE101", "e103", " E104 ")

x [substring (x, 1, 3) == "E10"]

You' will need to replace x with another *character vector*.
(As touched on earlier, a data.frame may cause some problems).

Here's some variations:

unique (x [substring (x, 1, 3) == "E10"])

y <- toupper (x)
y [substring (y, 1, 3) == "E10"]

y <- trimws (x)
y [substring (y, 1, 3) == "E10"]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to extract strings in any column and in any row that start with

2020-05-15 Thread cpolwart

This is almost certainly not the most efficient way:

tot <- data.frame(v1 = paste0(LETTERS[seq(1:5)],seq(1:10)),
 v2 = paste0(LETTERS[seq(1:5)],seq(from = 101, to=110, by = 
1)),
 v3 = paste0(LETTERS[seq(1:5)],seq(from = 111, to=120, by = 
1)),
 v4 = paste0(LETTERS[seq(1:5)],seq(from = 121, to=130, by = 
1)),
 v5 = paste0(LETTERS[seq(1:5)],seq(from = 131, to=140, by = 
1)),
 v6 = paste0(LETTERS[seq(1:5)],seq(from = 101, to=110, by = 
1))

 )

# set a variable to hold the result
myResult <- NULL

# iterate through each variable
for (v in 1:length(tot[1,])) {
  thisResult <- as.character(tot[grepl ('^E10', tot[,v]),v])
  myResult <- c(myResult, thisResult)
}

myResult <- unique( myResult )


===

Indeed as I wrote this Jeff has popped along with unlist!

Using my example above:

unique ( as.character( unlist (tot) )[grepl ('^E10', as.character( 
unlist (tot) ) )] )


does what you wanted (you may not need the as.characters if you are on R 
4.o, or if your df has chars rather than factors.


On 2020-05-15 21:34, Jeff Newmiller wrote:

If you want to treat your data frame as if it were a vector, then
convert it to a vector before you give it to grep.

unlist(tot)

On May 15, 2020 12:24:17 PM PDT, Ana Marija 
 wrote:

Hello,

this command was running for more than 2 hours
grep("E10",tot,value=T)
and no output

and this command
df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))

gave me a subset (a data frame) of tot where ^E10

what I need is just a vector or all values in tot which start with 
E10.


Thanks
Ana

On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
 wrote:


Read about regular expressions... they are extremely useful.

df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))

It is bad form not to put spaces around the <- assignment.


On May 15, 2020 10:00:04 AM PDT, Ana Marija

 wrote:

>Hello,
>
>I have a data frame:
>
>> dim(tot)
>[1] 502536   1093
>
>How would I extract from it all strings that start with E10?
>
>I know how to extract all rows that contain with E10
>df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
>> dim(df0)
>[1] 5105 1093
>
>but I just need a vector of strings that start with E10...
>it would look something like this:
>
>[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
>
>Thanks
>Ana
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to extract strings in any column and in any row that start with

2020-05-15 Thread Jeff Newmiller
If you want to treat your data frame as if it were a vector, then convert it to 
a vector before you give it to grep.

unlist(tot)

On May 15, 2020 12:24:17 PM PDT, Ana Marija  wrote:
>Hello,
>
>this command was running for more than 2 hours
>grep("E10",tot,value=T)
>and no output
>
>and this command
>df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>
>gave me a subset (a data frame) of tot where ^E10
>
>what I need is just a vector or all values in tot which start with E10.
>
>Thanks
>Ana
>
>On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
> wrote:
>>
>> Read about regular expressions... they are extremely useful.
>>
>> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>>
>> It is bad form not to put spaces around the <- assignment.
>>
>>
>> On May 15, 2020 10:00:04 AM PDT, Ana Marija
> wrote:
>> >Hello,
>> >
>> >I have a data frame:
>> >
>> >> dim(tot)
>> >[1] 502536   1093
>> >
>> >How would I extract from it all strings that start with E10?
>> >
>> >I know how to extract all rows that contain with E10
>> >df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
>> >> dim(df0)
>> >[1] 5105 1093
>> >
>> >but I just need a vector of strings that start with E10...
>> >it would look something like this:
>> >
>> >[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
>> >
>> >Thanks
>> >Ana
>> >
>> >__
>> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >PLEASE do read the posting guide
>> >http://www.R-project.org/posting-guide.html
>> >and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Sent from my phone. Please excuse my brevity.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to extract strings in any column and in any row that start with

2020-05-15 Thread Ana Marija
Hello,

this command was running for more than 2 hours
grep("E10",tot,value=T)
and no output

and this command
df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))

gave me a subset (a data frame) of tot where ^E10

what I need is just a vector or all values in tot which start with E10.

Thanks
Ana

On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
 wrote:
>
> Read about regular expressions... they are extremely useful.
>
> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>
> It is bad form not to put spaces around the <- assignment.
>
>
> On May 15, 2020 10:00:04 AM PDT, Ana Marija  
> wrote:
> >Hello,
> >
> >I have a data frame:
> >
> >> dim(tot)
> >[1] 502536   1093
> >
> >How would I extract from it all strings that start with E10?
> >
> >I know how to extract all rows that contain with E10
> >df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
> >> dim(df0)
> >[1] 5105 1093
> >
> >but I just need a vector of strings that start with E10...
> >it would look something like this:
> >
> >[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
> >
> >Thanks
> >Ana
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to extract strings in any column and in any row that start with

2020-05-15 Thread Jeff Newmiller
Read about regular expressions... they are extremely useful.

df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))

It is bad form not to put spaces around the <- assignment.


On May 15, 2020 10:00:04 AM PDT, Ana Marija  wrote:
>Hello,
>
>I have a data frame:
>
>> dim(tot)
>[1] 502536   1093
>
>How would I extract from it all strings that start with E10?
>
>I know how to extract all rows that contain with E10
>df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
>> dim(df0)
>[1] 5105 1093
>
>but I just need a vector of strings that start with E10...
>it would look something like this:
>
>[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
>
>Thanks
>Ana
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to extract strings in any column and in any row that start with

2020-05-15 Thread Ana Marija
Hello,

I have a data frame:

> dim(tot)
[1] 502536   1093

How would I extract from it all strings that start with E10?

I know how to extract all rows that contain with E10
df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
> dim(df0)
[1] 5105 1093

but I just need a vector of strings that start with E10...
it would look something like this:

[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"

Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.