Re: [R] selecting columns from a data frame or data table by type, ie, numeric, integer

2016-04-29 Thread Carl Sutton via R-help
Thank you Bill Dunlap.  So simple I never tried that approach. Tried dozens of 
others though, read manuals till I was getting headaches, and of course the 
answer was simple when one is competent.   Learning, its a struggle, but slowly 
getting there.
Thanks again
 Carl Sutton CPA
 

On Friday, April 29, 2016 10:50 AM, William Dunlap  
wrote:
 
 

 > dt1[ vapply(dt1, FUN=is.numeric, FUN.VALUE=NA) ]    a   c1   1 1.12   2 
 > 1.0...10 10 0.2


Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Fri, Apr 29, 2016 at 9:19 AM, Carl Sutton via R-help  
wrote:

Good morning RGuru's
I have a data frame of 575 columns.  I want to extract only those columns that 
are numeric(double) or integer to do some machine learning with.  I have 
searched the web for a couple of days (off and on) and have not found anything 
that shows how to do this.   Lots of ways to extract rows, but not columns.  I 
have attempted to use "(x == y)" indices extraction method but that threw error 
that == was for atomic vectors and lists, and I was doing this on a data frame.

My test code is below

#  a technique to get column classes
library(data.table)
a <- 1:10
b <- c("a","b","c","d","e","f","g","h","i","j")
c <- seq(1.1, .2, length = 10)
dt1 <- data.table(a,b,c)
str(dt1)
col.classes <- sapply(dt1, class)
head(col.classes)
dt2 <- subset(dt1, typeof = "double" | "numeric")
str(dt2)
dt2   #  not subset
dt2 <- dt1[, list(typeof = "double")]
str(dt2)
class_data <- dt1[,sapply(dt1,is.integer) | sapply(dt1, is.numeric)]
class_data
sum(class_data)
typeof(class_data)
names(class_data)
str(class_data)
 Any help is appreciated
Carl Sutton CPA

        [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] selecting columns from a data frame or data table by type, ie, numeric, integer

2016-04-29 Thread Giorgio Garziano
Hi,

I was able to replicate the solution as suggested by William in case of
data.frame class, not in case of data.table class.
In case of data.table, I had to do some minor changes as shown below.


library(data.table)
a <- 1:10
b <- c("a","b","c","d","e","f","g","h","i","j")
c <- seq(1.1, .2, length = 10)

# in case of data frame
dt1 <- data.frame(a,b,c)
dt1[vapply(dt1, FUN=is.numeric, FUN.VALUE=NA)]

a   c
1   1 1.1
2   2 1.0
3   3 0.9
4   4 0.8
5   5 0.7
6   6 0.6
7   7 0.5
8   8 0.4
9   9 0.3
10 10 0.2

# in case of data table
dt1 <- data.table(a,b,c)
dt1[, vapply(dt1, FUN=is.numeric, FUN.VALUE=NA), with=FALSE]

a   c
1   1 1.1
2   2 1.0
3   3 0.9
4   4 0.8
5   5 0.7
6   6 0.6
7   7 0.5
8   8 0.4
9   9 0.3
10 10 0.2


--

Best,

GG




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting columns from a data frame or data table by type, ie, numeric, integer

2016-04-29 Thread William Dunlap via R-help
> dt1[ vapply(dt1, FUN=is.numeric, FUN.VALUE=NA) ]
a   c
1   1 1.1
2   2 1.0
...
10 10 0.2



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Apr 29, 2016 at 9:19 AM, Carl Sutton via R-help <
r-help@r-project.org> wrote:

> Good morning RGuru's
> I have a data frame of 575 columns.  I want to extract only those columns
> that are numeric(double) or integer to do some machine learning with.  I
> have searched the web for a couple of days (off and on) and have not found
> anything that shows how to do this.   Lots of ways to extract rows, but not
> columns.  I have attempted to use "(x == y)" indices extraction method but
> that threw error that == was for atomic vectors and lists, and I was doing
> this on a data frame.
>
> My test code is below
>
> #  a technique to get column classes
> library(data.table)
> a <- 1:10
> b <- c("a","b","c","d","e","f","g","h","i","j")
> c <- seq(1.1, .2, length = 10)
> dt1 <- data.table(a,b,c)
> str(dt1)
> col.classes <- sapply(dt1, class)
> head(col.classes)
> dt2 <- subset(dt1, typeof = "double" | "numeric")
> str(dt2)
> dt2   #  not subset
> dt2 <- dt1[, list(typeof = "double")]
> str(dt2)
> class_data <- dt1[,sapply(dt1,is.integer) | sapply(dt1, is.numeric)]
> class_data
> sum(class_data)
> typeof(class_data)
> names(class_data)
> str(class_data)
>  Any help is appreciated
> Carl Sutton CPA
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting columns based on partial names

2014-06-19 Thread Chris Dolanc

Thank you. That worked.

On 19-Jun-2014 3:24 PM, Uwe Ligges wrote:



On 19.06.2014 23:50, Chris Dolanc wrote:

Hello,

I have a data frame with > 5000 columns and I'd like to be able to make
subsets of that data frame made up of certain columns by using part of
the column names. I've had a surprisingly hard time finding something
that works by searching online.

For example, lets say I have a data frame (df) of 2 obs. of 6 variables.
The 6 variables are called "1940_tmax", "1940_ppt", "1940_tmin",
"1941_tmax", "1941_ppt", "1941_tmin". I want to create a new data frame
with only the variables that have "ppt" in the variable (column) name,
so that it looks like this:

plot name1940_ppt1941_ppt
774-CL231   344
778-RW  228   313

Thanks.



df[ , grepl("_ppt$", names(df))]

Best,
Uwe Ligges



--
Christopher R. Dolanc
Post-doctoral Researcher
University of California, Davis &
University of Montana

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting columns based on partial names

2014-06-19 Thread Jim Lemon
On Thu, 19 Jun 2014 02:50:20 PM Chris Dolanc wrote:
> Hello,
> 
> I have a data frame with > 5000 columns and I'd like to be able to 
make
> subsets of that data frame made up of certain columns by using 
part of
> the column names. I've had a surprisingly hard time finding 
something
> that works by searching online.
> 
> For example, lets say I have a data frame (df) of 2 obs. of 6 
variables.
> The 6 variables are called "1940_tmax", "1940_ppt", "1940_tmin",
> "1941_tmax", "1941_ppt", "1941_tmin". I want to create a new data 
frame
> with only the variables that have "ppt" in the variable (column) 
name,
> so that it looks like this:
> 
> plot name1940_ppt1941_ppt
> 774-CL231   344
> 778-RW  228   313
> 
Hi Chris,
One way is to get the column indices:

grep("ppt",names(df))
[1] 2 5

so,

newdf<-df[grep("ppt",names(df))]

and then you apparently want to add a column with some other 
information, so probably:

newdf<-cbind(,
 df[grep("ppt",names(df))])

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting columns based on partial names

2014-06-19 Thread Uwe Ligges



On 19.06.2014 23:50, Chris Dolanc wrote:

Hello,

I have a data frame with > 5000 columns and I'd like to be able to make
subsets of that data frame made up of certain columns by using part of
the column names. I've had a surprisingly hard time finding something
that works by searching online.

For example, lets say I have a data frame (df) of 2 obs. of 6 variables.
The 6 variables are called "1940_tmax", "1940_ppt", "1940_tmin",
"1941_tmax", "1941_ppt", "1941_tmin". I want to create a new data frame
with only the variables that have "ppt" in the variable (column) name,
so that it looks like this:

plot name1940_ppt1941_ppt
774-CL231   344
778-RW  228   313

Thanks.



df[ , grepl("_ppt$", names(df))]

Best,
Uwe Ligges

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-27 Thread Paul Miller
Hi Greg,

This is very helpful. Thanks for explaining it. I'm clearly going to need to 
improve my understanding of regular expressions. Currently busy trying to 
figure out Sweave and knitr though.

Paul

--- On Thu, 4/26/12, Greg Snow <538...@gmail.com> wrote:

> From: Greg Snow <538...@gmail.com>
> Subject: Re: [R] Selecting columns whose names contain "mutated" except when 
> they also contain "non" or "un"
> To: "Paul Miller" 
> Cc: r-help@r-project.org
> Received: Thursday, April 26, 2012, 1:55 PM
> Sorry I took so long getting back to
> this, but the paying job needs to
> take priority.
> 
> The regular expression "(? looks for a string that
> matches "muta" then looks at the characters immediately
> before it to
> see if they match either "un" or "non" in which case it
> makes it a not
> match.  More specifically the regular expression engine
> steps through
> the string and at each point tries the match, so at a given
> point it
> will first see if "un" is before that point, if it is then
> this point
> can't match and it moves the checking point, if it is not
> "un" then it
> moves to the next negative look behind and sees if "non" is
> just
> before the point.  If neither "un" or "non" are just
> before the point
> then it starts matching characters after the point to see if
> they
> match "muta".
> 
> So the next pattern is "(?!muta)non|un", the (?!muta) is a
> negative
> look ahead which starts at the point and checks forward to
> see that
> the next characters are not "muta" (but does not include
> them in the
> match), in this case it is a no-op because you are saying
> that you
> want to match at a point where the next characters are not
> "muta" but
> are "non"  and since the next set of characters cannot
> be both this is
> the same as just matching "non", also you need to be aware
> of the
> operator precedence, in that pattern the (?!muta) part only
> applied to
> the "non", not the "un".
> 
> To match "nonmuta" or "unmuta" a simple pattern would just
> be
> "(non|un)muta" or "(no|u)nmuta".  You could use the
> positive
> lookbehind (you would still need an "or"), but it would be
> overkill
> for a grep command.  The difference in the positive
> look ahead/behind
> is more important for replacing where the look ahead/behind
> is needed
> for the match to happen, but is not captured as part of the
> match to
> be replaced.
> 
> 
> 
> On Tue, Apr 24, 2012 at 7:40 AM, Paul Miller 
> wrote:
> > Hi Greg,
> >
> > This is quite helpful. Not so good yet with regular
> expressions in general or Perl-like regular expressions.
> Found the help page though, and think I was able to
> determine how the code works as well as how I would select
> only instances where "muta" is preceeded by either "non" or
> "un".
> >
> >> (tmp <-
> c('mutation','nonmutated','unmutated','verymutated','other'))
> > [1] "mutation"    "nonmutated"  "unmutated"  
> "verymutated" "other"
> >
> >> grep("(? > [1] 1 4
> >
> >> grep("(?!muta)non|un", tmp, perl=TRUE)
> > [1] 2 3
> >
> > Did I get the second grep right?
> >
> > If so, do you have any sense of why it seems to fail
> when I apply it to my data?
> >
> >> KRASyn$NonMutant_comb <-
> rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn),
> perl=TRUE)])
> >
> > Error in rowSums(KRASyn[grep("(?!muta)non|un",
> names(KRASyn), perl = TRUE)]) :
> >  'x' must be numeric
> >
> > Thanks,
> >
> > Paul
> >
> 
> 
> 
> -- 
> Gregory (Greg) L. Snow Ph.D.
> 538...@gmail.com
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-27 Thread Martin Maechler
> David Winsemius 
> on Mon, 23 Apr 2012 12:16:39 -0400 writes:

> On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:

>> Hello All,
>> 
>> Started out awhile ago trying to select columns in a
>> dataframe whose names contain some variation of the word
>> "mutant" using code like:
>> 
>> names(KRASyn)[grep("muta", names(KRASyn))]
>> 
>> The idea then would be to add together the various
>> columns using code like:
>> 
>> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta",
>> names(KRASyn))])
>> 
>> What I discovered though, is that this selects columns
>> like "nonmutated" and "unmutated" as well as columns like
>> "mutated", "mutation", and "mutational".
>> 
>> So I'd like to know how to select columns that have some
>> variation of the word "mutant" without the "non" or the
>> "un". I've been looking around for an example of how to
>> do that but haven't found anything yet.
>> 
>> Can anyone show me how to select the columns I need?

> If you want only columns whose names _begin_ with "muta"
> then add the "^" character at the beginning of your
> pattern:

> names(KRASyn)[grep("^muta", names(KRASyn))]

> (This should be explained on the ?regex page.)

It *is* !Search for "beginning" and you're there.
Martin

> David Winsemius, MD West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-26 Thread Greg Snow
Sorry I took so long getting back to this, but the paying job needs to
take priority.

The regular expression "(? wrote:
> Hi Greg,
>
> This is quite helpful. Not so good yet with regular expressions in general or 
> Perl-like regular expressions. Found the help page though, and think I was 
> able to determine how the code works as well as how I would select only 
> instances where "muta" is preceeded by either "non" or "un".
>
>> (tmp <- c('mutation','nonmutated','unmutated','verymutated','other'))
> [1] "mutation"    "nonmutated"  "unmutated"   "verymutated" "other"
>
>> grep("(? [1] 1 4
>
>> grep("(?!muta)non|un", tmp, perl=TRUE)
> [1] 2 3
>
> Did I get the second grep right?
>
> If so, do you have any sense of why it seems to fail when I apply it to my 
> data?
>
>> KRASyn$NonMutant_comb <- rowSums(KRASyn[grep("(?!muta)non|un", 
>> names(KRASyn), perl=TRUE)])
>
> Error in rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), perl = TRUE)]) :
>  'x' must be numeric
>
> Thanks,
>
> Paul
>



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-25 Thread peter dalgaard

On Apr 24, 2012, at 19:15 , Rui Barradas wrote:
> 
> Has anyone realized that both 'non' and 'un' end with the same letter? The
> only one we really need to check?
> 
> (tmp <- c('mutation','nonmutated','unmutated','verymutated','other')) 
> 
> i1 <- grepl("muta", tmp)
> i2 <- grepl("nmuta", tmp)
> 
> tmp[i1 & !i2]
> 


Yes, I was wondering why people were avoiding the obvious use of grepl(). I'm 
not too happy about the "nmuta" technique though: What about "deletionmutation" 
and such? Might as well do the safe(r) thing:

i2 <- grepl("unmuta", tmp) | grepl("nonmuta", tmp) 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-25 Thread Paul Miller
Hello Dr. Winsemius,

There was a non-numeric column. Thanks for helping me to see the obvious.

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-24 Thread Rui Barradas
Hello,


Greg Snow wrote
> 
> Here is a method that uses negative look behind:
> 
>> tmp <- c('mutation','nonmutated','unmutated','verymutated','other')
>> grep("(? [1] 1 4
> 
> it looks for muta that is not immediatly preceeded by un or non (but
> it would match "unusually mutated" since the un is not
> immediatly
> befor the muta).
> 
> Hope this helps,
> 
> On Mon, Apr 23, 2012 at 10:10 AM, Paul Miller  wrote:
>> Hello All,
>>
>> Started out awhile ago trying to select columns in a dataframe whose
>> names contain some variation of the word "mutant" using code like:
>>
>> names(KRASyn)[grep("muta", names(KRASyn))]
>>
>> The idea then would be to add together the various columns using code
>> like:
>>
>> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))])
>>
>> What I discovered though, is that this selects columns like "nonmutated"
>> and "unmutated" as well as columns like "mutated", "mutation", and
>> "mutational".
>>
>> So I'd like to know how to select columns that have some variation of the
>> word "mutant" without the "non" or the "un". I've been looking around for
>> an example of how to do that but haven't found anything yet.
>>
>> Can anyone show me how to select the columns I need?
>>
>> Thanks,
>>
>> Paul
>>
>> __
>> R-help@ mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Gregory (Greg) L. Snow Ph.D.
> 538280@
> 
> __
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


Has anyone realized that both 'non' and 'un' end with the same letter? The
only one we really need to check?

(tmp <- c('mutation','nonmutated','unmutated','verymutated','other')) 

i1 <- grepl("muta", tmp)
i2 <- grepl("nmuta", tmp)

tmp[i1 & !i2]


Now, not an answer to Greg's post, just convoluted.


(tmp <- c(tmp, 'permutation', 'commutation'))

cols <- list()
cols[[1]] <- grep("muta", tmp)
cols[[2]] <- grep("nmuta", tmp)
cols[[3]] <- grep("(per)|(com)muta", tmp)

Reduce(setdiff, cols)

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/Selecting-columns-whose-names-contain-mutated-except-when-they-also-contain-non-or-un-tp4580914p4584219.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-24 Thread David Winsemius


On Apr 24, 2012, at 9:40 AM, Paul Miller wrote:


Hi Greg,

This is quite helpful. Not so good yet with regular expressions in  
general or Perl-like regular expressions. Found the help page  
though, and think I was able to determine how the code works as well  
as how I would select only instances where "muta" is preceeded by  
either "non" or "un".



(tmp <- c('mutation','nonmutated','unmutated','verymutated','other'))

[1] "mutation""nonmutated"  "unmutated"   "verymutated" "other"


grep("(?
[1] 1 4


grep("(?!muta)non|un", tmp, perl=TRUE)

[1] 2 3

Did I get the second grep right?

If so, do you have any sense of why it seems to fail when I apply it  
to my data?


KRASyn$NonMutant_comb <- rowSums(KRASyn[grep("(?!muta)non|un",  
names(KRASyn), perl=TRUE)])


Error in rowSums() :
 'x' must be numeric


The error message strongly suggests at least one non-numeric column.  
What does this return:


lapply( KRASyn[grep("(?!muta)non|un", names(KRASyn), perl=TRUE)],
  is.numeric)

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-24 Thread Paul Miller
Hi Greg,

This is quite helpful. Not so good yet with regular expressions in general or 
Perl-like regular expressions. Found the help page though, and think I was able 
to determine how the code works as well as how I would select only instances 
where "muta" is preceeded by either "non" or "un".

> (tmp <- c('mutation','nonmutated','unmutated','verymutated','other'))
[1] "mutation""nonmutated"  "unmutated"   "verymutated" "other"  

> grep("(? grep("(?!muta)non|un", tmp, perl=TRUE)
[1] 2 3

Did I get the second grep right?

If so, do you have any sense of why it seems to fail when I apply it to my data?

> KRASyn$NonMutant_comb <- rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), 
> perl=TRUE)])

Error in rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), perl = TRUE)]) : 
  'x' must be numeric

Thanks,

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-23 Thread Greg Snow
Here is a method that uses negative look behind:

> tmp <- c('mutation','nonmutated','unmutated','verymutated','other')
> grep("(? wrote:
> Hello All,
>
> Started out awhile ago trying to select columns in a dataframe whose names 
> contain some variation of the word "mutant" using code like:
>
> names(KRASyn)[grep("muta", names(KRASyn))]
>
> The idea then would be to add together the various columns using code like:
>
> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))])
>
> What I discovered though, is that this selects columns like "nonmutated" and 
> "unmutated" as well as columns like "mutated", "mutation", and "mutational".
>
> So I'd like to know how to select columns that have some variation of the 
> word "mutant" without the "non" or the "un". I've been looking around for an 
> example of how to do that but haven't found anything yet.
>
> Can anyone show me how to select the columns I need?
>
> Thanks,
>
> Paul
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-23 Thread Paul Miller
Hi Bert,

Yes, code like:

x <- names(yourdataframe)
grepl("muta",x) & !grepl("nonmuta|unmuta",x)

works perfectly.

Thanks very much for your help.

Paul




--- On Mon, 4/23/12, Bert Gunter  wrote:

> From: Bert Gunter 
> Subject: Re: [R] Selecting columns whose names contain "mutated" except when 
> they also contain "non" or "un"
> To: "Paul Miller" 
> Cc: "David Winsemius" , r-help@r-project.org
> Received: Monday, April 23, 2012, 12:15 PM
> But maybe ... (see below)
> -- Bert
> 
> On Mon, Apr 23, 2012 at 9:25 AM, Paul Miller 
> wrote:
> > Hello Dr. Winsemius,
> >
> > Unfortunately, I also have terms like "krasmutated". So
> simply selecting words that start with "muta" won't work in
> this case.
> >
> > Thanks,
> >
> > Paul
> >
> >
> > --- On Mon, 4/23/12, David Winsemius 
> wrote:
> >
> >> From: David Winsemius 
> >> Subject: Re: [R] Selecting columns whose names
> contain "mutated" except when they also contain "non" or
> "un"
> >> To: "Paul Miller" 
> >> Cc: r-help@r-project.org
> >> Received: Monday, April 23, 2012, 11:16 AM
> >>
> >> On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:
> >>
> >> > Hello All,
> >> >
> >> > Started out awhile ago trying to select
> columns in a
> >> dataframe whose names contain some variation of the
> word
> >> "mutant" using code like:
> >> >
> >> > names(KRASyn)[grep("muta", names(KRASyn))]
> >> >
> >> > The idea then would be to add together the
> various
> >> columns using code like:
> >> >
> >> > KRASyn$Mutant_comb <-
> rowSums(KRASyn[grep("muta",
> >> names(KRASyn))])
> >> >
> >> > What I discovered though, is that this selects
> columns
> >> like "nonmutated" and "unmutated" as well as
> columns like
> >> "mutated", "mutation", and "mutational".
> >> >
> >> > So I'd like to know how to select columns that
> have
> >> some variation of the word "mutant" without the
> "non" or the
> >> "un". I've been looking around for an example of
> how to do
> >> that but haven't found anything yet.
> 
> If this **is** a complete specification then wouldn't
> simply:
> 
> x <- names(yourdataframe)
>  grepl("muta",x) & !grepl("nonmuta|unmuta",x)
> 
> do it?
> 
> e.g.
> > x <-
> c("nonmutated","unmutated","mutation","mutated","krasmutated")
> > grepl("muta",x) & !grepl("nonmuta|unmuta",x)
> [1] FALSE FALSE  TRUE  TRUE  TRUE
> 
> >> >
> >> > Can anyone show me how to select the columns I
> need?
> >>
> >> If you want only columns whose names _begin_ with
> "muta"
> >> then add the "^" character at the beginning of
> your
> >> pattern:
> >>
> >> names(KRASyn)[grep("^muta", names(KRASyn))]
> >>
> >> (This should be explained on the ?regex page.)
> >>
> >> --
> >> David Winsemius, MD
> >> West Hartford, CT
> >>
> >>
> >
> > __
> > R-help@r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> 
> 
> 
> -- 
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-23 Thread Bert Gunter
But maybe ... (see below)
-- Bert

On Mon, Apr 23, 2012 at 9:25 AM, Paul Miller  wrote:
> Hello Dr. Winsemius,
>
> Unfortunately, I also have terms like "krasmutated". So simply selecting 
> words that start with "muta" won't work in this case.
>
> Thanks,
>
> Paul
>
>
> --- On Mon, 4/23/12, David Winsemius  wrote:
>
>> From: David Winsemius 
>> Subject: Re: [R] Selecting columns whose names contain "mutated" except when 
>> they also contain "non" or "un"
>> To: "Paul Miller" 
>> Cc: r-help@r-project.org
>> Received: Monday, April 23, 2012, 11:16 AM
>>
>> On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:
>>
>> > Hello All,
>> >
>> > Started out awhile ago trying to select columns in a
>> dataframe whose names contain some variation of the word
>> "mutant" using code like:
>> >
>> > names(KRASyn)[grep("muta", names(KRASyn))]
>> >
>> > The idea then would be to add together the various
>> columns using code like:
>> >
>> > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta",
>> names(KRASyn))])
>> >
>> > What I discovered though, is that this selects columns
>> like "nonmutated" and "unmutated" as well as columns like
>> "mutated", "mutation", and "mutational".
>> >
>> > So I'd like to know how to select columns that have
>> some variation of the word "mutant" without the "non" or the
>> "un". I've been looking around for an example of how to do
>> that but haven't found anything yet.

If this **is** a complete specification then wouldn't simply:

x <- names(yourdataframe)
 grepl("muta",x) & !grepl("nonmuta|unmuta",x)

do it?

e.g.
> x <- c("nonmutated","unmutated","mutation","mutated","krasmutated")
> grepl("muta",x) & !grepl("nonmuta|unmuta",x)
[1] FALSE FALSE  TRUE  TRUE  TRUE

>> >
>> > Can anyone show me how to select the columns I need?
>>
>> If you want only columns whose names _begin_ with "muta"
>> then add the "^" character at the beginning of your
>> pattern:
>>
>> names(KRASyn)[grep("^muta", names(KRASyn))]
>>
>> (This should be explained on the ?regex page.)
>>
>> --
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-23 Thread David Winsemius


On Apr 23, 2012, at 12:25 PM, Paul Miller wrote:


Hello Dr. Winsemius,

Unfortunately, I also have terms like "krasmutated". So simply  
selecting words that start with "muta" won't work in this case.


You are aware that negative indexing can be used with grep aren't you?

--
David.


Thanks,

Paul


--- On Mon, 4/23/12, David Winsemius  wrote:


From: David Winsemius 
Subject: Re: [R] Selecting columns whose names contain "mutated"  
except when they also contain "non" or "un"

To: "Paul Miller" 
Cc: r-help@r-project.org
Received: Monday, April 23, 2012, 11:16 AM

On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:


Hello All,

Started out awhile ago trying to select columns in a

dataframe whose names contain some variation of the word
"mutant" using code like:


names(KRASyn)[grep("muta", names(KRASyn))]

The idea then would be to add together the various

columns using code like:


KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta",

names(KRASyn))])


What I discovered though, is that this selects columns

like "nonmutated" and "unmutated" as well as columns like
"mutated", "mutation", and "mutational".


So I'd like to know how to select columns that have

some variation of the word "mutant" without the "non" or the
"un". I've been looking around for an example of how to do
that but haven't found anything yet.


Can anyone show me how to select the columns I need?


If you want only columns whose names _begin_ with "muta"
then add the "^" character at the beginning of your
pattern:

names(KRASyn)[grep("^muta", names(KRASyn))]

(This should be explained on the ?regex page.)

--
David Winsemius, MD
West Hartford, CT




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-23 Thread Bert Gunter
Below.

-- Bert

On Mon, Apr 23, 2012 at 9:10 AM, Paul Miller  wrote:
> Hello All,
>
> Started out awhile ago trying to select columns in a dataframe whose names 
> contain some variation of the word "mutant" using code like:
>
> names(KRASyn)[grep("muta", names(KRASyn))]
>
> The idea then would be to add together the various columns using code like:
>
> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))])
>
> What I discovered though, is that this selects columns like "nonmutated" and 
> "unmutated" as well as columns like "mutated", "mutation", and "mutational".
>
> So I'd like to know how to select columns that have some variation of the 
> word "mutant" without the "non" or the "un". I've been looking around for an 
> example of how to do that but haven't found anything yet.

You can't, because you have not provided a full specification of what
can be selected and what can't. Software can only do what you tell it
to -- it cannot read minds. Once you have provided a a complete and
accurate specification of inclusion/exclusion criteria, it should be
easy to write a regex procedure.

"The fault, dear Brutus, lies not in the stars but in ourselves."

-- Bert





>
> Can anyone show me how to select the columns I need?
>
> Thanks,
>
> Paul
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-23 Thread Paul Miller
Hello Dr. Winsemius,

Unfortunately, I also have terms like "krasmutated". So simply selecting words 
that start with "muta" won't work in this case. 

Thanks,

Paul


--- On Mon, 4/23/12, David Winsemius  wrote:

> From: David Winsemius 
> Subject: Re: [R] Selecting columns whose names contain "mutated" except when 
> they also contain "non" or "un"
> To: "Paul Miller" 
> Cc: r-help@r-project.org
> Received: Monday, April 23, 2012, 11:16 AM
> 
> On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:
> 
> > Hello All,
> > 
> > Started out awhile ago trying to select columns in a
> dataframe whose names contain some variation of the word
> "mutant" using code like:
> > 
> > names(KRASyn)[grep("muta", names(KRASyn))]
> > 
> > The idea then would be to add together the various
> columns using code like:
> > 
> > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta",
> names(KRASyn))])
> > 
> > What I discovered though, is that this selects columns
> like "nonmutated" and "unmutated" as well as columns like
> "mutated", "mutation", and "mutational".
> > 
> > So I'd like to know how to select columns that have
> some variation of the word "mutant" without the "non" or the
> "un". I've been looking around for an example of how to do
> that but haven't found anything yet.
> > 
> > Can anyone show me how to select the columns I need?
> 
> If you want only columns whose names _begin_ with "muta"
> then add the "^" character at the beginning of your
> pattern:
> 
> names(KRASyn)[grep("^muta", names(KRASyn))]
> 
> (This should be explained on the ?regex page.)
> 
> --
> David Winsemius, MD
> West Hartford, CT
> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

2012-04-23 Thread David Winsemius


On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:


Hello All,

Started out awhile ago trying to select columns in a dataframe whose  
names contain some variation of the word "mutant" using code like:


names(KRASyn)[grep("muta", names(KRASyn))]

The idea then would be to add together the various columns using  
code like:


KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))])

What I discovered though, is that this selects columns like  
"nonmutated" and "unmutated" as well as columns like "mutated",  
"mutation", and "mutational".


So I'd like to know how to select columns that have some variation  
of the word "mutant" without the "non" or the "un". I've been  
looking around for an example of how to do that but haven't found  
anything yet.


Can anyone show me how to select the columns I need?


If you want only columns whose names _begin_ with "muta" then add the  
"^" character at the beginning of your pattern:


names(KRASyn)[grep("^muta", names(KRASyn))]

(This should be explained on the ?regex page.)

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting columns

2011-02-15 Thread Phil Spector

Clayton -
   From your explanation, it sounds like you want to create 
a new file removing the "Location" variable, and all the 
variables that have the string "Ambient" or "Name" in their names.
Suppose that your data frame is called mydata, and you wish to 
create a reduced csv file called "mydata.csv"


write.csv(mydata[,grep('Location|Ambient|Name',names(mydata),invert=TRUE)],
  file='mydata.csv')

should do what you want, but without a more concrete example, it's
just a guess.

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu

On Tue, 15 Feb 2011, Clayton Dorrity wrote:


I need help.

I have very big .csv files with many unnecessary columns.  From the original
.csv files I would like to create a new .csv file with just the columns I
need.

For example:

The original column heading are: Date, Time, Location, Sensor Name, Sensor
Serial, Ambient Temp, IR Temp, Sensor Name.1, Sensor Serial.1, Ambient
Temp.1, IR Temp.1, Sensor Name.2, Sensor Serial.2, Ambient
Temp.2,..Sensor Name.45

I would like to create a new .csv file with only Date, Time, Sensor Serial,
IR Temp, Sensor Serial.1, IR Temp.1, Sensor Serial.2, IR Temp.2,.Sensor
Serial.45, IR Temp.45, etc


Any help on this matter would be greatly appreciated.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting columns based on values of two variables

2009-09-06 Thread Dimitris Rizopoulos

probably you're looking for

subset(capdist, ida %in% c("DEN","SWD","FIN") & idb %in% 
c("DEN","SWD","FIN"))



I hope it helps.

Best,
Dimitris


Thomas Jensen wrote:

Dear R-list,

I am having troubles selecting rows from a very large data-set  
containing distances between capitals.


The structure of the data-set looks like this:

numaida numbidb kmdist  midist
12  USA 20  CAN 731 456
22  USA 31  BHM 16231012
32  USA 40  CUB 18131130


I want to select a subset of these dyads, and have tried the following  
code:


subset(capdist,ida == c("DEN","SWD","FIN") & idb ==  
c("DEN","SWD","FIN"))


This should ideally give me the dyads involving only Denmark, Sweden  
and Finland, however i get the error message:


[1] numa   idanumb   idbkmdist midist
<0 rows> (or 0-length row.names)
Warning messages:
1: In is.na(e1) | is.na(e2) :
   longer object length is not a multiple of shorter object length
2: In `==.default`(ida, c("DEN", "SWD", "FIN")) :
   longer object length is not a multiple of shorter object length
3: In is.na(e1) | is.na(e2) :
   longer object length is not a multiple of shorter object length
4: In `==.default`(idb, c("DEN", "SWD", "FIN")) :
   longer object length is not a multiple of shorter object length

Any help would be greatly appreciated,

Best, Thomas Jensen
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.