Re: [R] subset question

2007-08-27 Thread jim holtman
Here is one way of checking to see if a row contains a particular
value and setting the contents of a new column:

n <- 20
# create test data
x <- 
data.frame(sample(letters,n),sample(letters,n),sample(letters,n),sample(letters,n))
# add a column indicating if the row contains 'a', 'b' or 'c'
x$a <- apply(x[, 1:4], 1, function(.row) any(.row %in% c('a','b','c'))) + 0


On 8/27/07, Kirsten Beyer <[EMAIL PROTECTED]> wrote:
> I would like to code records in a dataset with a 1 if any of the
> columns 9-67 contain a particular code, and zero if they don't.  I've
> been working with "subset" and it seems that something like
> subset(data, data[9:67]--"12345") would work, but I have been
> unsuccessful so far.  It seems like a simple problem - any help is
> appreciated!
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset using noncontiguous variables by name (not index)

2007-08-27 Thread Muenchen, Robert A (Bob)
Thanks for helping me see why R doesn't have the "obvious"! -Bob

> -Original Message-
> From: Thomas Lumley [mailto:[EMAIL PROTECTED]
> Sent: Monday, August 27, 2007 2:12 PM
> To: Muenchen, Robert A (Bob)
> Subject: RE: [R] subset using noncontiguous variables by name (not
> index)
> 
> On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote:
> 
> > Thomas, that's a good point. I was thinking of anscombe[x1::y1]
> making
> > it clear which one, but you would then want just x1::y1 to have
> > unambiguous meaning on its own, which is impossible.
> >
> > As for x1:xN, it's unambiguous on its own.
> 
> 
> It actually isn't. We already have a meaning. Consider
>x1<-4
>xN<-6
>x1:xN
> It also breaks R's argument passing rules by treating x1 as string
> rather than a name.
> 
> What would be unambiguous at the moment is "x1":"x4", provided there
> was a sufficiently precise set of rules on what was allowed. Consider
>   "x1":"x-1"(negative?)
>   "x1":"x3.14"  (non-integer?)
>   "x3.12":"x3.14" (is the prefix x or x3.?)
>   "x1":"X4" (the prefix changes)
>   "01":"14" (is the prefix empty or 0?)
>   "x09":"xA2" (is this illegal decimal or legal hexadecimal?)
>   "IL23R1":"IL23R4" (what is the prefix?)
>   "x1a":"x4a"(infix numbering?)
> 
> 
> 
>   -thomas
> 
> Thomas Lumley Assoc. Professor, Biostatistics
> [EMAIL PROTECTED] University of Washington, Seattle
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset question

2007-08-27 Thread Kirsten Beyer
I would like to code records in a dataset with a 1 if any of the
columns 9-67 contain a particular code, and zero if they don't.  I've
been working with "subset" and it seems that something like
subset(data, data[9:67]--"12345") would work, but I have been
unsuccessful so far.  It seems like a simple problem - any help is
appreciated!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset using noncontiguous variables by name (not index)

2007-08-27 Thread Thomas Lumley
On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote:

> Gabor, That works great!
>
> I think this would be a very helpful addition to the main R
> distribution. Perhaps with a single colon representing numerical order
> (exactly as you have written it) and two colons representing the order
> of the variables as they appear in the data frame (your first example).
> That's analogous to SAS' x1-xN, which you know gets those N variables,
> and a--z, which selects an unknown number of variables a through z. How
> many that is depends upon their order in the data frame. That would not
> only be very useful in general, but it would also make transitioning to
> R from SAS or SPSS less confusing.
>
> Is R still being extended in such basic ways, or does that muck up
> existing programs too much?
>

In principle base R can be extended like that, but a strong case is needed 
for non-standard evaluation rules and for depleting the restricted supply 
of short binary operator names.

The reason for subset() and its behaviour is that 'variables as they 
appear the in data frame' is typically ambiguous -- which data frame?  In 
SPSS you have only one and in SAS there is a default one, so there is no 
ambiguity in X1--Y2, but in R it needs another argument specifying the 
data frame, so it can't really be a binary operator.

The double colon :: and triple colon ::: are already used for namespaces, 
and a search of r-help reveals two previous, different, suggestions for 
%:%.


-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset using noncontiguous variables by name (not index)

2007-08-27 Thread Muenchen, Robert A (Bob)
Gabor, That works great!

I think this would be a very helpful addition to the main R
distribution. Perhaps with a single colon representing numerical order
(exactly as you have written it) and two colons representing the order
of the variables as they appear in the data frame (your first example).
That's analogous to SAS' x1-xN, which you know gets those N variables,
and a--z, which selects an unknown number of variables a through z. How
many that is depends upon their order in the data frame. That would not
only be very useful in general, but it would also make transitioning to
R from SAS or SPSS less confusing.

Is R still being extended in such basic ways, or does that muck up
existing programs too much?

Thanks,
Bob

> -Original Message-
> From: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
> Sent: Sunday, August 26, 2007 8:52 PM
> To: Muenchen, Robert A (Bob)
> Cc: r-help@stat.math.ethz.ch
> Subject: Re: [R] subset using noncontiguous variables by name (not
> index)
> 
> Try this:
> 
> > "%:%" <- function(x, y) {
> +prex <- gsub("[0-9]", "", x); postx <- gsub("[^0-9]", "", x)
> +prey <- gsub("[0-9]", "", y); posty <- gsub("[^0-9]", "", y)
> +stopifnot(prex == prey)
> +paste(prex, seq(from = as.numeric(postx), to =
> as.numeric(posty)), sep = "")
> + }
> > "x2" %:% "x4"
> [1] "x2" "x3" "x4"
> 
> 
> On 8/26/07, Muenchen, Robert A (Bob) <[EMAIL PROTECTED]> wrote:
> > Thanks Bert & Gabor for two very interesting solutions!
> >
> > It would be very handy in R if string1:stringN generated
> > "string1","string2"..."stringN" it would make selections like this
> much
> > more obvious. I know it's easy to with the colon operator and paste
> > function but that's quite a step up in complexity compared to SAS'
x1
> > x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that
beginners
> > face early in learning R.
> >
> > While on the subject of the colon operator, why doesn't
> anscombe[[1:4]]
> > select the x variables in list form as anscombe[,1:4] or
> anscombe[1:4]
> > do in data frame form?
> >
> > Thanks,
> >
> > Bob
> >
> > =
> > Bob Muenchen (pronounced Min'-chen), Manager
> > Statistical Consulting Center
> > U of TN Office of Information Technology
> > 200 Stokely Management Center, Knoxville, TN 37996-0520
> > Voice: (865) 974-5230
> > FAX: (865) 974-4810
> > Email: [EMAIL PROTECTED]
> > Web: http://oit.utk.edu/scc,
> > News: http://listserv.utk.edu/archives/statnews.html
> > =
> >
> >
> > > -Original Message-
> > > From: Bert Gunter [mailto:[EMAIL PROTECTED]
> > > Sent: Sunday, August 26, 2007 6:50 PM
> > > To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob)
> > > Cc: r-help@stat.math.ethz.ch
> > > Subject: RE: [R] subset using noncontiguous variables by name (not
> > > index)
> > >
> > > The problem is that "x3:x5" does not mean what you think it means.
> The
> > > only
> > > reason it does the right thing in subset() is because a clever
> trick
> > is
> > > used
> > > there (read the code -- it's not hard to understand) to ensure
that
> it
> > > does.
> > > Gabor has essentially mimicked that trick in his solution.
> > >
> > > However, it is not necessary do this. You can construct the call
> > > directly as
> > > you tried to do. Using the anscombe example, here's how:
> > >
> > > chooz <- "c(x1,x3:x4,y2)"  ## enclose the desired expression in
> quotes
> > > do.call (subset, list( x = anscombe, select = parse(text =
chooz)))
> > >
> > > -- Bert Gunter
> > > Genentech Non-Clinical Statistics
> > > South San Francisco, CA
> > >
> > > "The business of the statistician is to catalyze the scientific
> > > learning
> > > process."  - George E. P. Box
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: [EMAIL PROTECTED]
> > > > [mailto:[EMAIL PROTECTED] On Behalf Of Gabor
> > > > Grothendieck
> > > > Sent: Sunday, August 26, 2007 2:10 PM
> > > > To: Muenchen, Robert A (Bob)
> > > > Cc: r-hel

Re: [R] subset using noncontiguous variables by name (not index)

2007-08-26 Thread Gabor Grothendieck
Try this:

> "%:%" <- function(x, y) {
+prex <- gsub("[0-9]", "", x); postx <- gsub("[^0-9]", "", x)
+prey <- gsub("[0-9]", "", y); posty <- gsub("[^0-9]", "", y)
+stopifnot(prex == prey)
+paste(prex, seq(from = as.numeric(postx), to =
as.numeric(posty)), sep = "")
+ }
> "x2" %:% "x4"
[1] "x2" "x3" "x4"


On 8/26/07, Muenchen, Robert A (Bob) <[EMAIL PROTECTED]> wrote:
> Thanks Bert & Gabor for two very interesting solutions!
>
> It would be very handy in R if string1:stringN generated
> "string1","string2"..."stringN" it would make selections like this much
> more obvious. I know it's easy to with the colon operator and paste
> function but that's quite a step up in complexity compared to SAS' x1
> x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that beginners
> face early in learning R.
>
> While on the subject of the colon operator, why doesn't anscombe[[1:4]]
> select the x variables in list form as anscombe[,1:4] or anscombe[1:4]
> do in data frame form?
>
> Thanks,
>
> Bob
>
> =
> Bob Muenchen (pronounced Min'-chen), Manager
> Statistical Consulting Center
> U of TN Office of Information Technology
> 200 Stokely Management Center, Knoxville, TN 37996-0520
> Voice: (865) 974-5230
> FAX: (865) 974-4810
> Email: [EMAIL PROTECTED]
> Web: http://oit.utk.edu/scc,
> News: http://listserv.utk.edu/archives/statnews.html
> =
>
>
> > -Original Message-
> > From: Bert Gunter [mailto:[EMAIL PROTECTED]
> > Sent: Sunday, August 26, 2007 6:50 PM
> > To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob)
> > Cc: r-help@stat.math.ethz.ch
> > Subject: RE: [R] subset using noncontiguous variables by name (not
> > index)
> >
> > The problem is that "x3:x5" does not mean what you think it means. The
> > only
> > reason it does the right thing in subset() is because a clever trick
> is
> > used
> > there (read the code -- it's not hard to understand) to ensure that it
> > does.
> > Gabor has essentially mimicked that trick in his solution.
> >
> > However, it is not necessary do this. You can construct the call
> > directly as
> > you tried to do. Using the anscombe example, here's how:
> >
> > chooz <- "c(x1,x3:x4,y2)"  ## enclose the desired expression in quotes
> > do.call (subset, list( x = anscombe, select = parse(text = chooz)))
> >
> > -- Bert Gunter
> > Genentech Non-Clinical Statistics
> > South San Francisco, CA
> >
> > "The business of the statistician is to catalyze the scientific
> > learning
> > process."  - George E. P. Box
> >
> >
> >
> > > -Original Message-
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of Gabor
> > > Grothendieck
> > > Sent: Sunday, August 26, 2007 2:10 PM
> > > To: Muenchen, Robert A (Bob)
> > > Cc: r-help@stat.math.ethz.ch
> > > Subject: Re: [R] subset using noncontiguous variables by name
> > > (not index)
> > >
> > > Using builtin data frame anscombe try this. First we set up a
> > > data frame
> > > anscombe.seq which has one row containing 1, 2, 3, ... .  Then
> select
> > > out from that data frame and unlist it to get the desired
> > > index vector.
> > >
> > > > anscombe.seq <- replace(anscombe[1,], TRUE, seq_along(anscombe))
> > > > idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
> > > > anscombe[idx]
> > >x1 x3 x4   y2
> > > 1  10 10  8 9.14
> > > 2   8  8  8 8.14
> > > 3  13 13  8 8.74
> > > 4   9  9  8 8.77
> > > 5  11 11  8 9.26
> > > 6  14 14  8 8.10
> > > 7   6  6  8 6.13
> > > 8   4  4 19 3.10
> > > 9  12 12  8 9.13
> > > 10  7  7  8 7.26
> > > 11  5  5  8 4.74
> > >
> > >
> > > On 8/26/07, Muenchen, Robert A (Bob) <[EMAIL PROTECTED]> wrote:
> > > > Hi All,
> > > >
> > > > I'm using the subset function to select a list of variables, some
> > of
> > > > which are contiguous in the data frame, and others of which
> > > are not. It
> > > > works fine when I use the form:
> > > >
> > > 

Re: [R] subset using noncontiguous variables by name (not index)

2007-08-26 Thread Muenchen, Robert A (Bob)
Thanks Bert & Gabor for two very interesting solutions!

It would be very handy in R if string1:stringN generated
"string1","string2"..."stringN" it would make selections like this much
more obvious. I know it's easy to with the colon operator and paste
function but that's quite a step up in complexity compared to SAS' x1
x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that beginners
face early in learning R.

While on the subject of the colon operator, why doesn't anscombe[[1:4]]
select the x variables in list form as anscombe[,1:4] or anscombe[1:4]
do in data frame form?

Thanks,

Bob

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html
=


> -Original Message-
> From: Bert Gunter [mailto:[EMAIL PROTECTED]
> Sent: Sunday, August 26, 2007 6:50 PM
> To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob)
> Cc: r-help@stat.math.ethz.ch
> Subject: RE: [R] subset using noncontiguous variables by name (not
> index)
> 
> The problem is that "x3:x5" does not mean what you think it means. The
> only
> reason it does the right thing in subset() is because a clever trick
is
> used
> there (read the code -- it's not hard to understand) to ensure that it
> does.
> Gabor has essentially mimicked that trick in his solution.
> 
> However, it is not necessary do this. You can construct the call
> directly as
> you tried to do. Using the anscombe example, here's how:
> 
> chooz <- "c(x1,x3:x4,y2)"  ## enclose the desired expression in quotes
> do.call (subset, list( x = anscombe, select = parse(text = chooz)))
> 
> -- Bert Gunter
> Genentech Non-Clinical Statistics
> South San Francisco, CA
> 
> "The business of the statistician is to catalyze the scientific
> learning
> process."  - George E. P. Box
> 
> 
> 
> > -Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Gabor
> > Grothendieck
> > Sent: Sunday, August 26, 2007 2:10 PM
> > To: Muenchen, Robert A (Bob)
> > Cc: r-help@stat.math.ethz.ch
> > Subject: Re: [R] subset using noncontiguous variables by name
> > (not index)
> >
> > Using builtin data frame anscombe try this. First we set up a
> > data frame
> > anscombe.seq which has one row containing 1, 2, 3, ... .  Then
select
> > out from that data frame and unlist it to get the desired
> > index vector.
> >
> > > anscombe.seq <- replace(anscombe[1,], TRUE, seq_along(anscombe))
> > > idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
> > > anscombe[idx]
> >x1 x3 x4   y2
> > 1  10 10  8 9.14
> > 2   8  8  8 8.14
> > 3  13 13  8 8.74
> > 4   9  9  8 8.77
> > 5  11 11  8 9.26
> > 6  14 14  8 8.10
> > 7   6  6  8 6.13
> > 8   4  4 19 3.10
> > 9  12 12  8 9.13
> > 10  7  7  8 7.26
> > 11  5  5  8 4.74
> >
> >
> > On 8/26/07, Muenchen, Robert A (Bob) <[EMAIL PROTECTED]> wrote:
> > > Hi All,
> > >
> > > I'm using the subset function to select a list of variables, some
> of
> > > which are contiguous in the data frame, and others of which
> > are not. It
> > > works fine when I use the form:
> > >
> > > subset(mydata,select=c(x1,x3:x5,x7) )
> > >
> > > In reality, my list is far more complex. So I would like to
> > store it in
> > > a variable to substitute in for c(x1,x3:x5,x7) but cannot get it
to
> > > work. That use of the c function seems to violate R rules,
> > so I'm not
> > > sure how it works at all. A small simulation of the problem
> > is below.
> > >
> > > If the variable names & orders were really this simple, I could
use
> > > indices like
> > >
> > > summary( mydata[ ,c(1,3:5,7) ] )
> > >
> > > but alas, they are not.
> > >
> > > How does the c function work this way in the first place,
> > and how can I
> > > make this substitution?
> > >
> > > Thanks,
> > > Bob
> > >
> > > mydata <- data.frame(
> > >  x1=c(1,2,3,4,5),
> > >  x2=c(1,2,3,4,5),
> > >  x3=c(1,2,3,4,5),
> > >  x4=c(1,2,3,4

Re: [R] subset using noncontiguous variables by name (not index)

2007-08-26 Thread Bert Gunter
The problem is that "x3:x5" does not mean what you think it means. The only
reason it does the right thing in subset() is because a clever trick is used
there (read the code -- it's not hard to understand) to ensure that it does.
Gabor has essentially mimicked that trick in his solution.

However, it is not necessary do this. You can construct the call directly as
you tried to do. Using the anscombe example, here's how:

chooz <- "c(x1,x3:x4,y2)"  ## enclose the desired expression in quotes
do.call (subset, list( x = anscombe, select = parse(text = chooz)))

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Gabor 
> Grothendieck
> Sent: Sunday, August 26, 2007 2:10 PM
> To: Muenchen, Robert A (Bob)
> Cc: r-help@stat.math.ethz.ch
> Subject: Re: [R] subset using noncontiguous variables by name 
> (not index)
> 
> Using builtin data frame anscombe try this. First we set up a 
> data frame
> anscombe.seq which has one row containing 1, 2, 3, ... .  Then select
> out from that data frame and unlist it to get the desired 
> index vector.
> 
> > anscombe.seq <- replace(anscombe[1,], TRUE, seq_along(anscombe))
> > idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
> > anscombe[idx]
>x1 x3 x4   y2
> 1  10 10  8 9.14
> 2   8  8  8 8.14
> 3  13 13  8 8.74
> 4   9  9  8 8.77
> 5  11 11  8 9.26
> 6  14 14  8 8.10
> 7   6  6  8 6.13
> 8   4  4 19 3.10
> 9  12 12  8 9.13
> 10  7  7  8 7.26
> 11  5  5  8 4.74
> 
> 
> On 8/26/07, Muenchen, Robert A (Bob) <[EMAIL PROTECTED]> wrote:
> > Hi All,
> >
> > I'm using the subset function to select a list of variables, some of
> > which are contiguous in the data frame, and others of which 
> are not. It
> > works fine when I use the form:
> >
> > subset(mydata,select=c(x1,x3:x5,x7) )
> >
> > In reality, my list is far more complex. So I would like to 
> store it in
> > a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to
> > work. That use of the c function seems to violate R rules, 
> so I'm not
> > sure how it works at all. A small simulation of the problem 
> is below.
> >
> > If the variable names & orders were really this simple, I could use
> > indices like
> >
> > summary( mydata[ ,c(1,3:5,7) ] )
> >
> > but alas, they are not.
> >
> > How does the c function work this way in the first place, 
> and how can I
> > make this substitution?
> >
> > Thanks,
> > Bob
> >
> > mydata <- data.frame(
> >  x1=c(1,2,3,4,5),
> >  x2=c(1,2,3,4,5),
> >  x3=c(1,2,3,4,5),
> >  x4=c(1,2,3,4,5),
> >  x5=c(1,2,3,4,5),
> >  x6=c(1,2,3,4,5),
> >  x7=c(1,2,3,4,5)
> > )
> > mydata
> >
> > # This does what I want.
> > summary(
> >  subset(mydata,select=c(x1,x3:x5,x7) )
> > )
> >
> > # Can I substitute myVars?
> > attach(mydata)
> > myVars1 <- c(x1,x3:x5,x7)
> >
> > # Not looking good!
> > myVars1
> >
> > # This doesn't do the right thing.
> > summary(
> >  subset(mydata,select=myVars1 )
> > )
> >
> > # Total desperation on this attempt:
> > myVars2 <- "x1,x3:x5,x7"
> > myVars2
> >
> > # This doesn't work either.
> > summary(
> >  subset(mydata,select=myVars2 )
> > )
> >
> >
> >
> > =
> > Bob Muenchen (pronounced Min'-chen), Manager
> > Statistical Consulting Center
> > U of TN Office of Information Technology
> > 200 Stokely Management Center, Knoxville, TN 37996-0520
> > Voice: (865) 974-5230
> > FAX: (865) 974-4810
> > Email: [EMAIL PROTECTED]
> > Web: http://oit.utk.edu/scc,
> > News: http://listserv.utk.edu/archives/statnews.html
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset using noncontiguous variables by name (not index)

2007-08-26 Thread François Pinard
[Muenchen, Robert A (Bob)]

>I'm using the subset function to select a list of variables, some of
>which are contiguous in the data frame, and others of which are not. It
>works fine when I use the form:

>subset(mydata,select=c(x1,x3:x5,x7))

>In reality, my list is far more complex. So I would like to store it in
>a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to
>work. That use of the c function seems to violate R rules, so I'm not
>sure how it works at all. A small simulation of the problem is below.  

>mydata <- data.frame(
>  x1=c(1,2,3,4,5),
>  x2=c(1,2,3,4,5),
>  x3=c(1,2,3,4,5),
>  x4=c(1,2,3,4,5),
>  x5=c(1,2,3,4,5),
>  x6=c(1,2,3,4,5),
>  x7=c(1,2,3,4,5)
>)
>mydata

># This does what I want.
>summary(subset(mydata, select=c(x1, x3:x5, x7)))

Maybe:

  variables <- expression(c(x1, x3:x5, x7))

and later:

  summary(subset(mydata, select=eval(variables)))

However, I do not know how one computes the expression piecemeal, that 
is, better than by building a string and parsing the result.

-- 
François Pinard   http://pinard.progiciels-bpi.ca

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset using noncontiguous variables by name (not index)

2007-08-26 Thread Gabor Grothendieck
Using builtin data frame anscombe try this. First we set up a data frame
anscombe.seq which has one row containing 1, 2, 3, ... .  Then select
out from that data frame and unlist it to get the desired index vector.

> anscombe.seq <- replace(anscombe[1,], TRUE, seq_along(anscombe))
> idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
> anscombe[idx]
   x1 x3 x4   y2
1  10 10  8 9.14
2   8  8  8 8.14
3  13 13  8 8.74
4   9  9  8 8.77
5  11 11  8 9.26
6  14 14  8 8.10
7   6  6  8 6.13
8   4  4 19 3.10
9  12 12  8 9.13
10  7  7  8 7.26
11  5  5  8 4.74


On 8/26/07, Muenchen, Robert A (Bob) <[EMAIL PROTECTED]> wrote:
> Hi All,
>
> I'm using the subset function to select a list of variables, some of
> which are contiguous in the data frame, and others of which are not. It
> works fine when I use the form:
>
> subset(mydata,select=c(x1,x3:x5,x7) )
>
> In reality, my list is far more complex. So I would like to store it in
> a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to
> work. That use of the c function seems to violate R rules, so I'm not
> sure how it works at all. A small simulation of the problem is below.
>
> If the variable names & orders were really this simple, I could use
> indices like
>
> summary( mydata[ ,c(1,3:5,7) ] )
>
> but alas, they are not.
>
> How does the c function work this way in the first place, and how can I
> make this substitution?
>
> Thanks,
> Bob
>
> mydata <- data.frame(
>  x1=c(1,2,3,4,5),
>  x2=c(1,2,3,4,5),
>  x3=c(1,2,3,4,5),
>  x4=c(1,2,3,4,5),
>  x5=c(1,2,3,4,5),
>  x6=c(1,2,3,4,5),
>  x7=c(1,2,3,4,5)
> )
> mydata
>
> # This does what I want.
> summary(
>  subset(mydata,select=c(x1,x3:x5,x7) )
> )
>
> # Can I substitute myVars?
> attach(mydata)
> myVars1 <- c(x1,x3:x5,x7)
>
> # Not looking good!
> myVars1
>
> # This doesn't do the right thing.
> summary(
>  subset(mydata,select=myVars1 )
> )
>
> # Total desperation on this attempt:
> myVars2 <- "x1,x3:x5,x7"
> myVars2
>
> # This doesn't work either.
> summary(
>  subset(mydata,select=myVars2 )
> )
>
>
>
> =
> Bob Muenchen (pronounced Min'-chen), Manager
> Statistical Consulting Center
> U of TN Office of Information Technology
> 200 Stokely Management Center, Knoxville, TN 37996-0520
> Voice: (865) 974-5230
> FAX: (865) 974-4810
> Email: [EMAIL PROTECTED]
> Web: http://oit.utk.edu/scc,
> News: http://listserv.utk.edu/archives/statnews.html
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset using noncontiguous variables by name (not index)

2007-08-26 Thread Muenchen, Robert A (Bob)
Hi All,

I'm using the subset function to select a list of variables, some of
which are contiguous in the data frame, and others of which are not. It
works fine when I use the form:

subset(mydata,select=c(x1,x3:x5,x7) )

In reality, my list is far more complex. So I would like to store it in
a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to
work. That use of the c function seems to violate R rules, so I'm not
sure how it works at all. A small simulation of the problem is below. 

If the variable names & orders were really this simple, I could use
indices like 

summary( mydata[ ,c(1,3:5,7) ] ) 

but alas, they are not. 

How does the c function work this way in the first place, and how can I
make this substitution?

Thanks,
Bob

mydata <- data.frame(
  x1=c(1,2,3,4,5),
  x2=c(1,2,3,4,5),
  x3=c(1,2,3,4,5),
  x4=c(1,2,3,4,5),
  x5=c(1,2,3,4,5),
  x6=c(1,2,3,4,5),
  x7=c(1,2,3,4,5)
)
mydata

# This does what I want.
summary( 
  subset(mydata,select=c(x1,x3:x5,x7) ) 
)

# Can I substitute myVars?
attach(mydata)
myVars1 <- c(x1,x3:x5,x7)

# Not looking good!
myVars1

# This doesn't do the right thing.
summary( 
  subset(mydata,select=myVars1 ) 
)

# Total desperation on this attempt:
myVars2 <- "x1,x3:x5,x7"
myVars2

# This doesn't work either.
summary( 
  subset(mydata,select=myVars2 )
)



=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subset and logical operator error

2007-06-12 Thread Ken Knoblauch


Sébastien  free.fr> writes:

> 
> Can you please point to me my syntax mistake or indicate a method to get 
> this type of data.frame subset ?
> 
> Thank you in advance
> 
>   ID value
> 1  1   1.2
> 2  2   1.2
> 3  3   1.2
> 4  4   1.2
> 5  5 A
> 6  6 A
> 7  7 A
> 8  8 A
> subset(mdat,value!"A")
> 
> Error: syntax error, unexpected '!', expecting ',' in "subset(mdat,value!"
> 
Looks like you forgot the "=" as in

subset(mdat, value != "A")
  ID value
1  1   1.2
2  2   1.2
3  3   1.2
4  4   1.2

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subset and logical operator error

2007-06-12 Thread Sébastien
Hello,

It looks to me as if the ! logical operator cannot be called when 
subsetting a data.frame. In the example below, the value column has two 
factor levels (but my typical datasets have more), and what I am trying 
to do is to exclude all lines for which the "value" is different from 
"A". I have got a syntax error message everytime I try to use the 
subset() function. Unfortunatelly, the help on the subset function or 
the logical operators is not really specific on the way to implement 
this type of exclusion subset?

Can you please point to me my syntax mistake or indicate a method to get 
this type of data.frame subset ?

Thank you in advance

  ID value
1  1   1.2
2  2   1.2
3  3   1.2
4  4   1.2
5  5 A
6  6 A
7  7 A
8  8 A
subset(mdat,value!"A")

Error: syntax error, unexpected '!', expecting ',' in "subset(mdat,value!"

Sebastien

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset arg in (modified) evalq

2007-05-18 Thread Gabor Grothendieck
Try this:

e <- quote(summary(y + z))
all.vars(e)


On 5/18/07, Vadim Ogranovich <[EMAIL PROTECTED]> wrote:
> Sorry, I didn't explain myself clear enough. I knew about the select arg in
> subset(). My question was, given the expression expression(summary(x+y)),
> how to extract all names that will be looked up during its evaluation.
>
> As to checking performance assumptions, you are right, in most cases the
> overhead is negligible, but sometimes I work with really big data sets.
>
> Thanks a lot for your help,
> Vadim
>
>
> - Original Message -
> From: "Gabor Grothendieck" <[EMAIL PROTECTED]>
> To: "Vadim Ogranovich" <[EMAIL PROTECTED]>
> Cc: r-help@stat.math.ethz.ch
> Sent: Friday, May 18, 2007 9:53:26 AM (GMT-0600) America/Chicago
> Subject: Re: [R] subset arg in (modified) evalq
>
> I would check your performance assumption with an actual test before
> concluding such but at any rate subset does have a select argument. See
> ?subset
>
> On 5/18/07, Vadim Ogranovich <[EMAIL PROTECTED]> wrote:
> > Thanks Gabor!  This does exactly what I wanted.
> >
> > One follow-up question, how to extract the var names, in this case y, z,
> > from the expression? The subset function creates a new object and this may
> > be expensive when the data has a lot of irrelevant collumns. So I thougth
> > that I could reduce this to the columns I actually need.
> >
> > Thanks,
> > Vadim
> >
> >
> >
> > - Original Message -
> > From: "Gabor Grothendieck" <[EMAIL PROTECTED]>
> > To: "Vadim Ogranovich" <[EMAIL PROTECTED]>
> > Cc: r-help@stat.math.ethz.ch
> > Sent: Friday, May 18, 2007 9:19:49 AM (GMT-0600) America/Chicago
> > Subject: Re: [R] subset arg in (modified) evalq
> >
> > Try this:
> >
> >with(subset(data, x > 0), summary(y + z))
> >
> >
> > On 5/18/07, Vadim Ogranovich <[EMAIL PROTECTED]> wrote:
> > > Hi,
> > >
> > > When using evalq to evaluate expressions within a say data.frame context
> I
> > often wish there was a 'subset' argument, much like in lm() or any ather
> > advanced regression model. I would be grateful for a tip how to do this.
> > >
> > > Here is an illustration of what I want:
> > >
> > > n <- 100
> > > data <- data.frame(x=rnorm(n), y=rnorm(y), z=rnorm(z))
> > >
> > > # this works
> > > evalq({ i <- 0 > >
> > > # I want to do the above w/o explicit subscripting, e.g.
> > > myevalq(summary(y + z), subset=0 > >
> > > Thanks,
> > > Vadim
> > >
> > >[[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset arg in (modified) evalq

2007-05-18 Thread Gabor Grothendieck
I would check your performance assumption with an actual test before
concluding such but at any rate subset does have a select argument. See
?subset

On 5/18/07, Vadim Ogranovich <[EMAIL PROTECTED]> wrote:
> Thanks Gabor!  This does exactly what I wanted.
>
> One follow-up question, how to extract the var names, in this case y, z,
> from the expression? The subset function creates a new object and this may
> be expensive when the data has a lot of irrelevant collumns. So I thougth
> that I could reduce this to the columns I actually need.
>
> Thanks,
> Vadim
>
>
>
> - Original Message -
> From: "Gabor Grothendieck" <[EMAIL PROTECTED]>
> To: "Vadim Ogranovich" <[EMAIL PROTECTED]>
> Cc: r-help@stat.math.ethz.ch
> Sent: Friday, May 18, 2007 9:19:49 AM (GMT-0600) America/Chicago
> Subject: Re: [R] subset arg in (modified) evalq
>
> Try this:
>
>with(subset(data, x > 0), summary(y + z))
>
>
> On 5/18/07, Vadim Ogranovich <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > When using evalq to evaluate expressions within a say data.frame context I
> often wish there was a 'subset' argument, much like in lm() or any ather
> advanced regression model. I would be grateful for a tip how to do this.
> >
> > Here is an illustration of what I want:
> >
> > n <- 100
> > data <- data.frame(x=rnorm(n), y=rnorm(y), z=rnorm(z))
> >
> > # this works
> > evalq({ i <- 0 >
> > # I want to do the above w/o explicit subscripting, e.g.
> > myevalq(summary(y + z), subset=0 >
> > Thanks,
> > Vadim
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset arg in (modified) evalq

2007-05-18 Thread Vadim Ogranovich
Sorry, I didn't explain myself clear enough. I knew about the select arg in 
subset(). My question was, given the expression expression(summary(x+y)), how 
to extract all names that will be looked up during its evaluation. 

As to checking performance assumptions, you are right, in most cases the 
overhead is negligible, but sometimes I work with really big data sets. 

Thanks a lot for your help, 
Vadim 


- Original Message - 
From: " Gabor Grothendieck " < ggrothendieck @ gmail .com> 
To: " Vadim Ogranovich " < vogranovich @ jumptrading .com> 
Cc: r-help @stat.math. ethz .ch 
Sent: Friday, May 18, 2007 9:53:26 AM ( GMT-0600 ) America/Chicago 
Subject: Re: [R] subset arg in (modified) evalq 

I would check your performance assumption with an actual test before 
concluding such but at any rate subset does have a select argument. See 
?subset 

On 5/18/07, Vadim Ogranovich < vogranovich @ jumptrading .com> wrote: 
> Thanks Gabor ! This does exactly what I wanted. 
> 
> One follow-up question, how to extract the var names, in this case y, z, 
> from the expression? The subset function creates a new object and this may 
> be expensive when the data has a lot of irrelevant collumns . So I thougth 
> that I could reduce this to the columns I actually need. 
> 
> Thanks, 
> Vadim 
> 
> 
> 
> - Original Message - 
> From: " Gabor Grothendieck " < ggrothendieck @ gmail .com> 
> To: " Vadim Ogranovich " < vogranovich @ jumptrading .com> 
> Cc: r-help @stat.math. ethz .ch 
> Sent: Friday, May 18, 2007 9:19:49 AM ( GMT-0600 ) America/Chicago 
> Subject: Re: [R] subset arg in (modified) evalq 
> 
> Try this: 
> 
> with(subset(data, x > 0), summary(y + z)) 
> 
> 
> On 5/18/07, Vadim Ogranovich < vogranovich @ jumptrading .com> wrote: 
> > Hi, 
> > 
> > When using evalq to evaluate expressions within a say data.frame context I 
> often wish there was a 'subset' argument, much like in lm () or any ather 
> advanced regression model. I would be grateful for a tip how to do this. 
> > 
> > Here is an illustration of what I want: 
> > 
> > n <- 100 
> > data <- data.frame(x= rnorm (n), y= rnorm (y), z= rnorm (z)) 
> > 
> > # this works 
> > evalq ({ i <- 0 > 
> > # I want to do the above w/o explicit subscripting , e.g. 
> > myevalq (summary(y + z), subset=0 > 
> > Thanks, 
> > Vadim 
> > 
> > [[alternative HTML version deleted]] 
> > 
> > __ 
> > R-help @stat.math. ethz .ch mailing list 
> > https ://stat. ethz .ch/mailman/ listinfo / r-help 
> > PLEASE do read the posting guide 
> http :// www . R-project .org/ posting-guide . html 
> > and provide commented, minimal, self-contained , reproducible code. 
> > 
> 

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset arg in (modified) evalq

2007-05-18 Thread Vadim Ogranovich
Thanks Gabor! This does exactly what I wanted. 

One follow-up question, how to extract the var names, in this case y, z, from 
the expression? The subset function creates a new object and this may be 
expensive when the data has a lot of irrelevant collumns. So I thougth that I 
could reduce this to the columns I actually need. 

Thanks, 
Vadim 


- Original Message - 
From: "Gabor Grothendieck" <[EMAIL PROTECTED]> 
To: "Vadim Ogranovich" <[EMAIL PROTECTED]> 
Cc: r-help@stat.math.ethz.ch 
Sent: Friday, May 18, 2007 9:19:49 AM (GMT-0600) America/Chicago 
Subject: Re: [R] subset arg in (modified) evalq 

Try this: 

with(subset(data, x > 0), summary(y + z)) 


On 5/18/07, Vadim Ogranovich <[EMAIL PROTECTED]> wrote: 
> Hi, 
> 
> When using evalq to evaluate expressions within a say data.frame context I 
> often wish there was a 'subset' argument, much like in lm() or any ather 
> advanced regression model. I would be grateful for a tip how to do this. 
> 
> Here is an illustration of what I want: 
> 
> n <- 100 
> data <- data.frame(x=rnorm(n), y=rnorm(y), z=rnorm(z)) 
> 
> # this works 
> evalq({ i <- 0 
> # I want to do the above w/o explicit subscripting, e.g. 
> myevalq(summary(y + z), subset=0 
> Thanks, 
> Vadim 
> 
> [[alternative HTML version deleted]] 
> 
> __ 
> R-help@stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code. 
> 

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset arg in (modified) evalq

2007-05-18 Thread Gabor Grothendieck
Try this:

   with(subset(data, x > 0), summary(y + z))


On 5/18/07, Vadim Ogranovich <[EMAIL PROTECTED]> wrote:
> Hi,
>
> When using evalq to evaluate expressions within a say data.frame context I 
> often wish there was a 'subset' argument, much like in lm() or any ather 
> advanced regression model. I would be grateful for a tip how to do this.
>
> Here is an illustration of what I want:
>
> n <- 100
> data <- data.frame(x=rnorm(n), y=rnorm(y), z=rnorm(z))
>
> # this works
> evalq({ i <- 0
> # I want to do the above w/o explicit subscripting, e.g.
> myevalq(summary(y + z), subset=0
> Thanks,
> Vadim
>
>[[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset arg in (modified) evalq

2007-05-18 Thread Vadim Ogranovich
Hi, 

When using evalq to evaluate expressions within a say data.frame context I 
often wish there was a 'subset' argument, much like in lm() or any ather 
advanced regression model. I would be grateful for a tip how to do this. 

Here is an illustration of what I want: 

n <- 100 
data <- data.frame(x=rnorm(n), y=rnorm(y), z=rnorm(z)) 

# this works 
evalq({ i <- 0https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset

2007-05-04 Thread Martin Becker
Sorry,

of course it should read

 > subset(swiss, Agriculture > 60 & !(Examination %in% c(14,16)), select 
= c(Agriculture,Examination,Catholic))
 Agriculture Examination Catholic
Aigle   62.0  21 8.52
Avenches60.7  19 4.43
Cossonay69.3  22 2.82
Echallens   72.6  1824.20
Lavaux  73.0  19 2.84
Oron71.2  12 2.40
Paysd'enhaut63.5   6 2.56
Conthey 85.9   399.71
Entremont   84.9   799.68
Herens  89.7   5   100.00
Martigwy78.2  1298.96
Monthey 64.9   798.22
St Maurice  75.9   999.06
Sierre  84.6   399.46
Sion63.1  1396.83


Martin Becker wrote:
>
> [...]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset

2007-05-04 Thread Martin Becker
elyakhlifi mustapha wrote:
> hello,
>
>   
>> subset(swiss, Agriculture > 60 & Examination != c(14,16), select = 
>> c(Agriculture,Examination,Catholic))
>> 

Try %in% :

subset(swiss, Agriculture > 60 & Examination %in% c(14,16), select = 
c(Agriculture,Examination,Catholic))

Agriculture Examination Catholic
Broye  70.2  1692.85
Glane  67.8  1497.16
Veveyse64.5  1498.61
Aubonne67.5  14 2.27
Rolle  60.8  16 7.72

Regards,

  Martin

>  Agriculture Examination Catholic
> Broye   70.2  16 3.30
> Glane   67.8  14 4.20
> Aigle   62.0  21 5.16
> Avenches60.7  19 5.23
> Cossonay69.3  22 5.62
> Echallens   72.6  18 6.10
> Lavaux  73.0  19 9.96
> Oron71.2  1216.92
> Paysd'enhaut63.5   624.20
> Conthey 85.9   358.33
> Entremont   84.9   784.84
> Herens  89.7   590.57
> Martigwy78.2  1291.38
> Monthey 64.9   792.85
> St Maurice  75.9   993.40
> Sierre  84.6   396.83
> Sion63.1  1397.16
> Warning message:
> la longueur de l'objet le plus long
> n'est pas un multiple de la longueur de l'objet le plus court in: 
> Examination != c(14, 16) 
>
> As this example shows I'd like to know if it's possible to drop several lines 
> for example here to drop severals lines from Examination
> thanks
>
>
>   
> ___
>
>
>
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset

2007-05-04 Thread elyakhlifi mustapha
hello,

> subset(swiss, Agriculture > 60 & Examination != c(14,16), select = 
> c(Agriculture,Examination,Catholic))
 Agriculture Examination Catholic
Broye   70.2  16 3.30
Glane   67.8  14 4.20
Aigle   62.0  21 5.16
Avenches60.7  19 5.23
Cossonay69.3  22 5.62
Echallens   72.6  18 6.10
Lavaux  73.0  19 9.96
Oron71.2  1216.92
Paysd'enhaut63.5   624.20
Conthey 85.9   358.33
Entremont   84.9   784.84
Herens  89.7   590.57
Martigwy78.2  1291.38
Monthey 64.9   792.85
St Maurice  75.9   993.40
Sierre  84.6   396.83
Sion63.1  1397.16
Warning message:
la longueur de l'objet le plus long
n'est pas un multiple de la longueur de l'objet le plus court in: 
Examination != c(14, 16) 

As this example shows I'd like to know if it's possible to drop several lines 
for example here to drop severals lines from Examination
thanks


  
___





[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset

2007-04-23 Thread Stefan Grosse
What format does your date have? This is essential here. However it must
be something like subset(yourdata, year %in% 2004) how to extract the
year from your date you must find out yourself... (depending on the
dates format...)

ever considered reading an introductory text?
find some here:
http://cran.r-project.org/other-docs.html


Stefan

elyakhlifi mustapha wrote:
> hi,
> ok I understand how to use the subset function but sometimes I need to use it 
> to extract data by date and its format it isn't so easy (like below)
> for example like in using SQL I thougth that it was possible to write 
> "%/2004" but it doesn't run. Can you help me please about this?
>
>  subset(don, Date_O in "%/2004", select = c(Annee_O, Date_O))
>
>
>   
> ___ 
> Découvrez une nouvelle façon d'obtenir des réponses à toutes vos questions ! 
> Profitez des connaissances, des opinions et des expériences des internautes 
> sur Yahoo! Questions/Réponses 
>
>   [[alternative HTML version deleted]]
>
>   
> 
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset

2007-04-23 Thread elyakhlifi mustapha
hi,
ok I understand how to use the subset function but sometimes I need to use it 
to extract data by date and its format it isn't so easy (like below)
for example like in using SQL I thougth that it was possible to write "%/2004" 
but it doesn't run. Can you help me please about this?

 subset(don, Date_O in "%/2004", select = c(Annee_O, Date_O))


  
___ 
Découvrez une nouvelle façon d'obtenir des réponses à toutes vos questions ! 
Profitez des connaissances, des opinions et des expériences des internautes sur 
Yahoo! Questions/Réponses 

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset

2007-03-26 Thread Marc Schwartz
On Mon, 2007-03-26 at 15:43 -0400, Liaw, Andy wrote:
> From: Thomas Lumley
> > 
> > On Mon, 26 Mar 2007, Marc Schwartz wrote:
> > 
> > > Sergio,
> > >
> > > Please be sure to cc: the list (ie. Reply to All) with follow up 
> > > questions.
> > >
> > > In this case, you would use %in% with a negation:
> > >
> > > NewDF <- subset(DF, (var1 == 0) & (var2 == 0) & (!var3 %in% 2:3))
> > >
> > 
> > Probably a typo: should be !(var3 %in% 2:3) rather than (!var 
> > %in% 2:3)
> 
> I used to think so, but found I didn't need the parens:
> 
> R> a <- 1:3; b <- c(1, 3, 5)
> R> ! a %in% b
> [1] FALSE  TRUE FALSE
> R> ! (a %in% b)
> [1] FALSE  TRUE FALSE
> 
> Andy

Thanks Andy, you beat me to it  :-)

Just for the explicit sake of the variation as I used in my reply above:

> (!a %in% b)
[1] FALSE  TRUE FALSE


I suspect that Thomas may be fearing the following scenario:

> !a
[1] FALSE FALSE FALSE

> (!a) %in% b
[1] FALSE FALSE FALSE


If one looks at ?Syntax, the negation operator '!' is listed 5 rows
after '%any%' relative to precedence of operation, so our examples above
worked as documented.

HTH,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset [Broadcast]

2007-03-26 Thread Liaw, Andy
From: Thomas Lumley
> 
> On Mon, 26 Mar 2007, Marc Schwartz wrote:
> 
> > Sergio,
> >
> > Please be sure to cc: the list (ie. Reply to All) with follow up 
> > questions.
> >
> > In this case, you would use %in% with a negation:
> >
> > NewDF <- subset(DF, (var1 == 0) & (var2 == 0) & (!var3 %in% 2:3))
> >
> 
> Probably a typo: should be !(var3 %in% 2:3) rather than (!var 
> %in% 2:3)

I used to think so, but found I didn't need the parens:

R> a <- 1:3; b <- c(1, 3, 5)
R> ! a %in% b
[1] FALSE  TRUE FALSE
R> ! (a %in% b)
[1] FALSE  TRUE FALSE

Andy

>   -thomas
> 
> > See ?"%in%" for more information.
> >
> > HTH,
> >
> > Marc
> >
> > On Mon, 2007-03-26 at 17:30 +0200, Sergio Della Franca wrote:
> >> Ok, this run correctly.
> >>
> >> Another question for you:
> >>
> >> I want to put more than one condition for var3, i.e.:
> >> I like to create a subset when:
> >>  - var1=0
> >>  - var2=0
> >>  - var3 is different from 2 and from 3.
> >>
> >> Like you suggested, i perform this code:
> >> NewDF <- subset(DF, (var1 == 0) & (var2 == 0) & (var 3 != 
> 2)) & (var
> >> 3 != 3))
> >>
> >> There is a method to combine (var 3 != 2)) & (var 3 != 3)) in one 
> >> condition?
> >>
> >> Thank you.
> >>
> >> Sergio
> >>
> >>
> >>
> >> 2007/3/26, Marc Schwartz <[EMAIL PROTECTED]>:
> >> On Mon, 2007-03-26 at 17:02 +0200, Sergio Della 
> Franca wrote:
> >>> Dear R-Helpers,
> >>>
> >>> I want to make a subset from my data set.
> >>>
> >>> I'd like to perform different condition for subset.
> >>>
> >>> I.e.:
> >>>
> >>> I like to create a subset when:
> >>> - var1=0
> >>> - var2=0
> >>> - var3 is different from 2.
> >>>
> >>> How can i develop a subset under this condition?
> >>>
> >>> Thank you in advance.
> >>>
> >>> Sergio Della Franca.
> >>
> >> See ?subset
> >>
> >> Something along the lines of the following should work:
> >>
> >> NewDF <- subset(DF, (var1 == 0) & (var2 == 0) & 
> (var 3 != 0))
> >>
> >> HTH,
> >>
> >> Marc Schwartz
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> Thomas Lumley Assoc. Professor, Biostatistics
> [EMAIL PROTECTED] University of Washington, Seattle
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset

2007-03-26 Thread Thomas Lumley
On Mon, 26 Mar 2007, Marc Schwartz wrote:

> Sergio,
>
> Please be sure to cc: the list (ie. Reply to All) with follow up
> questions.
>
> In this case, you would use %in% with a negation:
>
> NewDF <- subset(DF, (var1 == 0) & (var2 == 0) & (!var3 %in% 2:3))
>

Probably a typo: should be !(var3 %in% 2:3) rather than (!var %in% 2:3)

  -thomas

> See ?"%in%" for more information.
>
> HTH,
>
> Marc
>
> On Mon, 2007-03-26 at 17:30 +0200, Sergio Della Franca wrote:
>> Ok, this run correctly.
>>
>> Another question for you:
>>
>> I want to put more than one condition for var3, i.e.:
>> I like to create a subset when:
>>  - var1=0
>>  - var2=0
>>  - var3 is different from 2 and from 3.
>>
>> Like you suggested, i perform this code:
>> NewDF <- subset(DF, (var1 == 0) & (var2 == 0) & (var 3 != 2)) & (var
>> 3 != 3))
>>
>> There is a method to combine (var 3 != 2)) & (var 3 != 3)) in one
>> condition?
>>
>> Thank you.
>>
>> Sergio
>>
>>
>>
>> 2007/3/26, Marc Schwartz <[EMAIL PROTECTED]>:
>> On Mon, 2007-03-26 at 17:02 +0200, Sergio Della Franca wrote:
>>> Dear R-Helpers,
>>>
>>> I want to make a subset from my data set.
>>>
>>> I'd like to perform different condition for subset.
>>>
>>> I.e.:
>>>
>>> I like to create a subset when:
>>> - var1=0
>>> - var2=0
>>> - var3 is different from 2.
>>>
>>> How can i develop a subset under this condition?
>>>
>>> Thank you in advance.
>>>
>>> Sergio Della Franca.
>>
>> See ?subset
>>
>> Something along the lines of the following should work:
>>
>> NewDF <- subset(DF, (var1 == 0) & (var2 == 0) & (var 3 != 0))
>>
>> HTH,
>>
>> Marc Schwartz
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset

2007-03-26 Thread Marc Schwartz
Sergio,

Please be sure to cc: the list (ie. Reply to All) with follow up
questions.

In this case, you would use %in% with a negation:

NewDF <- subset(DF, (var1 == 0) & (var2 == 0) & (!var3 %in% 2:3)) 

See ?"%in%" for more information.

HTH,

Marc

On Mon, 2007-03-26 at 17:30 +0200, Sergio Della Franca wrote:
> Ok, this run correctly.
>  
> Another question for you:
>  
> I want to put more than one condition for var3, i.e.:
> I like to create a subset when:
>  - var1=0
>  - var2=0
>  - var3 is different from 2 and from 3.
>  
> Like you suggested, i perform this code:
> NewDF <- subset(DF, (var1 == 0) & (var2 == 0) & (var 3 != 2)) & (var
> 3 != 3))
>  
> There is a method to combine (var 3 != 2)) & (var 3 != 3)) in one
> condition?
>  
> Thank you.
>  
> Sergio
>  
> 
>  
> 2007/3/26, Marc Schwartz <[EMAIL PROTECTED]>: 
> On Mon, 2007-03-26 at 17:02 +0200, Sergio Della Franca wrote:
> > Dear R-Helpers,
> >
> > I want to make a subset from my data set. 
> >
> > I'd like to perform different condition for subset.
> >
> > I.e.:
> >
> > I like to create a subset when:
> > - var1=0
> > - var2=0
> > - var3 is different from 2.
> >
> > How can i develop a subset under this condition?
> >
> > Thank you in advance.
> >
> > Sergio Della Franca.
> 
> See ?subset
> 
> Something along the lines of the following should work:
> 
> NewDF <- subset(DF, (var1 == 0) & (var2 == 0) & (var 3 != 0)) 
> 
> HTH,
> 
> Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset

2007-03-26 Thread Marc Schwartz
On Mon, 2007-03-26 at 17:02 +0200, Sergio Della Franca wrote:
> Dear R-Helpers,
> 
> I want to make a subset from my data set.
> 
> I'd like to perform different condition for subset.
> 
> I.e.:
> 
> I like to create a subset when:
> - var1=0
> - var2=0
> - var3 is different from 2.
> 
> How can i develop a subset under this condition?
> 
> Thank you in advance.
> 
> Sergio Della Franca.

See ?subset

Something along the lines of the following should work:

  NewDF <- subset(DF, (var1 == 0) & (var2 == 0) & (var 3 != 0))

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset

2007-03-26 Thread Sergio Della Franca
Dear R-Helpers,

I want to make a subset from my data set.

I'd like to perform different condition for subset.

I.e.:

I like to create a subset when:
- var1=0
- var2=0
- var3 is different from 2.

How can i develop a subset under this condition?

Thank you in advance.

Sergio Della Franca.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subset

2007-03-23 Thread Petr Klasterecky
You want to obtain a subset of your data, so what about to use subset()...??
You didn't even consider doing some basic search for the solution as the 
posting guide asks you...

Petr

Sergio Della Franca napsal(a):
> Dear R-Helpers,
> 
> I have this dataset:
> 
>YEARPRODUCTS cluster
> 1  10  2
> 2  42  3
> 3  25  2
> 4  42  3
> 5  40  3
> 6  45  1
> 7  44  1
> 8  47  1
> 9  42  1
> 
> 
> I want to create a subset (when cluster=1),
> 
> YEARPRODUCTS cluster
> 6  45  1
> 7  44  1
> 8  47  1
> 9  42  1
> 
> 
> How can i perform this?
> 
> 
> Thank you in advance.
> 
> 
> Sergio Della Franca
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Petr Klasterecky
Dept. of Probability and Statistics
Charles University in Prague
Czech Republic

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subset

2007-03-23 Thread Uwe Ligges
See ?subset !!!
And please do read the psoting guide.

Uwe Ligges


Sergio Della Franca wrote:
> Dear R-Helpers,
> 
> I have this dataset:
> 
>YEARPRODUCTS cluster
> 1  10  2
> 2  42  3
> 3  25  2
> 4  42  3
> 5  40  3
> 6  45  1
> 7  44  1
> 8  47  1
> 9  42  1
> 
> 
> I want to create a subset (when cluster=1),
> 
> YEARPRODUCTS cluster
> 6  45  1
> 7  44  1
> 8  47  1
> 9  42  1
> 
> 
> How can i perform this?
> 
> 
> Thank you in advance.
> 
> 
> Sergio Della Franca
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subset

2007-03-23 Thread Mark Wardle
Sergio Della Franca wrote:
> Dear R-Helpers,
> 
> I have this dataset:
> 
>YEARPRODUCTS cluster
> 1  10  2
> 2  42  3
> 3  25  2
> 4  42  3
> 5  40  3
> 6  45  1
> 7  44  1
> 8  47  1
> 9  42  1
> 
> 
> I want to create a subset (when cluster=1),
> 
> YEARPRODUCTS cluster
> 6  45  1
> 7  44  1
> 8  47  1
> 9  42  1
> 
> 
> How can i perform this?
> 
> 

You've answered your own question!

?subset


e.g., subset(my.data, cluster==1)

You should also notice that many R functions support a subset=
parameter, which is very useful!

Mark

-- 
Dr. Mark Wardle
Specialist registrar, Neurology
Cardiff, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subset

2007-03-23 Thread Sergio Della Franca
Dear R-Helpers,

I have this dataset:

   YEARPRODUCTS cluster
1  10  2
2  42  3
3  25  2
4  42  3
5  40  3
6  45  1
7  44  1
8  47  1
9  42  1


I want to create a subset (when cluster=1),

YEARPRODUCTS cluster
6  45  1
7  44  1
8  47  1
9  42  1


How can i perform this?


Thank you in advance.


Sergio Della Franca

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset by multiple columns satisfying the same condition

2007-03-18 Thread Peter Dalgaard
Frank Duan wrote:
> Hi All,
>
> I have a very simple question. Suppose I had a data frame with 100 columns,
> now I wanted to select rows with the values of  some columns satisfying the
> same condition, like all equal to "Tom". I know I can use the 'and' operator
> "&", but it's painful if there were many columns.
>
> Can anyone give me some advice? Thanks in advance,
>   
Here's one way:
 
rowSums(myframe != "Tom") == 0

The following approach might generalize more easily, though

 do.call("pmin", lapply(myframe, "==", "Tom"))

(notice that pmin on logical vectors is TRUE, if all are TRUE, else 
FALSE or NA).

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset by multiple columns satisfying the same condition

2007-03-18 Thread Peter McMahan
say x is you rdata frame and y is the vector of column indices you  
want to match to a condition:

x[apply(x[y]=="Tom",1,all),]

still, i feel like there's probably a better way...

On Mar 18, 2007, at 1:36 PM, Frank Duan wrote:

> Hi All,
>
> I have a very simple question. Suppose I had a data frame with 100  
> columns,
> now I wanted to select rows with the values of  some columns  
> satisfying the
> same condition, like all equal to "Tom". I know I can use the 'and'  
> operator
> "&", but it's painful if there were many columns.
>
> Can anyone give me some advice? Thanks in advance,
>
> FD
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset by multiple columns satisfying the same condition

2007-03-18 Thread Frank Duan
Hi All,

I have a very simple question. Suppose I had a data frame with 100 columns,
now I wanted to select rows with the values of  some columns satisfying the
same condition, like all equal to "Tom". I know I can use the 'and' operator
"&", but it's painful if there were many columns.

Can anyone give me some advice? Thanks in advance,

FD

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset function

2007-02-09 Thread Petr Pikal
Hi

works for me

> zeta
  tepl tio2 al2o3  iep
1   601   3.5 5.65
2   601   2.0 5.00
3   600   3.5 5.30
4   600   2.0 4.65
5   401   3.5 5.20
6   401   2.0 4.85
7   400   3.5 5.70
8   400   2.0 5.25
> fit<-lm(iep~al2o3, data=zeta)
> fit<-lm(iep~al2o3, data=zeta, subset=tepl==60)

so you shall check what results from just subsetting your data e.g.

subset(in.mi01, C_X01=="Berlin")

HTH
Petr



On 9 Feb 2007 at 7:21, Simon P. Kempf wrote:

From:   "Simon P. Kempf" <[EMAIL PROTECTED]>
To: 
Date sent:  Fri, 9 Feb 2007 07:21:00 +0100
Subject:[R] subset function

> Hello R-Users,
> 
> 
> 
> I have the following problem with the subset function:
> 
> 
> 
> See the following simple linear model. Here everything works fine:
> 
> 
> 
> >germany<-lm(RENT~AGE1, in.mi01)
> 
> 
> 
> However, if a use the same regression equation and only specify a
> subset, I get an error message:
> 
> 
> 
> > berlin<-lm(RENT~AGE1, in.mi01, subset=C_X01=="Berlin")
> 
> 
> 
> Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...)
> : 
> 
> 0 (non-NA) cases
> 
> 
> 
>  The datasets contains no missing values and for the city Berlin there
>  are
> 2200 observations.
> 
> 
> 
> > summary(in.mi01$C_X01)
> 
> Berlin   Düsseldorf   Frankfurt am MainHamburg
> Köln 
> 
> 2200 1638 2943
> 2068  759 
> 
> Leipzig  Munich   Others  
> Stuttgart
> 
> 
>  344 1514 7955
> 383
> 
> 
> 
> What am I doing wrong?
> 
> 
> 
> Thanks in advance for any help and suggestions,
> 
> 
> 
> Simon
> 
> 
> 
> 
> 
> 
>  [[alternative HTML version deleted]]
> 
> 

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset function

2007-02-08 Thread Simon P. Kempf
Hello R-Users,

 

I have the following problem with the subset function:

 

See the following simple linear model. Here everything works fine:

 

>germany<-lm(RENT~AGE1, in.mi01)

 

However, if a use the same regression equation and only specify a subset, I
get an error message:

 

> berlin<-lm(RENT~AGE1, in.mi01, subset=C_X01=="Berlin")

 

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 

0 (non-NA) cases

 

 The datasets contains no missing values and for the city Berlin there are
2200 observations.

 

> summary(in.mi01$C_X01)

Berlin   Düsseldorf   Frankfurt am MainHamburg
Köln 

2200 1638 2943
2068  759 

Leipzig  Munich   Others   Stuttgart


 344 1514 7955
383

 

What am I doing wrong?

 

Thanks in advance for any help and suggestions,

 

Simon

 

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subset by using multiple values

2006-12-31 Thread Farrel Buchinsky
I found a solution to my problem. I thought I would post it here. That will 
help me in 3 months when I have forgotten it or some other poor soul who 
stumbles across the same problem.

RawSeqBig<-RawSeqBig[RawSeqBig$ASSAY_ID %in% rejectrs$rs==FALSE,]

"Farrel Buchinsky" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
>I have a vector containg about 20 unique values. It is called rejectrs$rs.
> It is a factor
> I have a data frame with about 10 rows.
> I want to exclude all rows where in variable rs the value is one of the 20
> on the exclude list. I thought this would work but none did.
>
> RawSeqBig<-subset(RawSeqBig,ASSAY_ID!=rejectrs$rs)
>
> RawSeqBig<-subset(RawSeqBig,ASSAY_ID!=list(rejectrs$rs))
>
>
> -- 
> Farrel Buchinsky
> Mobile: (412) 779-1073
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subset by using multiple values

2006-12-31 Thread Farrel Buchinsky
I have a vector containg about 20 unique values. It is called rejectrs$rs.
It is a factor
I have a data frame with about 10 rows.
I want to exclude all rows where in variable rs the value is one of the 20
on the exclude list. I thought this would work but none did.

RawSeqBig<-subset(RawSeqBig,ASSAY_ID!=rejectrs$rs)

RawSeqBig<-subset(RawSeqBig,ASSAY_ID!=list(rejectrs$rs))


-- 
Farrel Buchinsky
Mobile: (412) 779-1073

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset question

2006-12-13 Thread Richard M. Heiberger
help("[.factor")


a <- factor(letters[1:5])
a
a[1:3]
a[1:3, drop=T]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset question

2006-12-13 Thread Aimin Yan
I have a data set p1982, its structure is the following

Then I take 20 observations from this dataset, and assign to pr.

in p1982, p has 1982 levels, in dataset pr,  p should have 1 levels.

But I do str(pr), it shows that p still has 1982 levels.

also for these

 > pr$aa
  [1] ARG THR ASP CYS TYR ASN VAL ASN ARG ILE ASP THR THR ALA SER CYS LYS 
THR ALA LYS
Levels: ALA ARG ASN ASP CYS GLN GLU HIS ILE LEU LYS MET PHE PRO SER THR TRP 
TYR VAL

it seems pr$aa don't have level GLU, but it list this level.

I don't understand this, Is there some reason for these?

thanks,



 > str(p1982)
'data.frame':   465979 obs. of  6 variables:
  $ p  : Factor w/ 1982 levels "154l_aa","1A0P_aa",..: 1 1 1 1 1 1 1 1 1 1 ...
  $ aa : Factor w/ 19 levels "ALA","ARG","ASN",..: 2 16 4 5 18 3 19 3 2 9 ...
  $ as : num  152.0  15.9  65.1  57.2  28.9 ...
  $ ms : num  108.8  28.3  59.2  49.9  31.8 ...
  $ cur: num  -0.1020  0.2564  0.0312 -0.0550  0.0526 ...
  $ sc : num   92.10 103.67   7.27  72.98  96.12 ...

 > pr<-p1982[1:20,]

 > str(pr)
'data.frame':   20 obs. of  6 variables:
  $ p  : Factor w/ 1982 levels "154l_aa","1A0P_aa",..: 1 1 1 1 1 1 1 1 1 1 ...
  $ aa : Factor w/ 19 levels "ALA","ARG","ASN",..: 2 16 4 5 18 3 19 3 2 9 ...
  $ as : num  152.0  15.9  65.1  57.2  28.9 ...
  $ ms : num  108.8  28.3  59.2  49.9  31.8 ...
  $ cur: num  -0.1020  0.2564  0.0312 -0.0550  0.0526 ...
  $ sc : num   92.10 103.67   7.27  72.98  96.12 ...

 > pr
  p  aa as ms cursc
1  154l_aa ARG 152.04 108.83 -0.10201400  92.10410
2  154l_aa THR  15.86  28.32  0.25635600 103.67100
3  154l_aa ASP  65.13  59.16  0.03121370   7.27311
4  154l_aa CYS  57.20  49.85 -0.05495890  72.97930
5  154l_aa TYR  28.87  31.75  0.05264570  96.11660
6  154l_aa ASN  31.14  31.09  0.06327110  55.65980
7  154l_aa VAL   0.00   0.00  0. 142.92100
8  154l_aa ASN  83.46  62.03 -0.10425800  78.38800
9  154l_aa ARG 156.02 111.52 -0.12303800  70.28280
10 154l_aa ILE   6.71  18.37  0.29933600 150.02100
11 154l_aa ASP  86.45  59.83 -0.15856600  73.52120
12 154l_aa THR  26.39  33.68  0.06101840 133.57200
13 154l_aa THR 107.61  70.48 -0.17145100  72.48660
14 154l_aa ALA   2.31   5.40  0.2400  90.67890
15 154l_aa SER  30.16  30.08 -0.00753989  96.24600
16 154l_aa CYS  60.11  46.86 -0.09648100  32.19480
17 154l_aa LYS 127.05  95.48 -0.11545500  81.00930
18 154l_aa THR   5.74  18.45  0.27963100 164.13100
19 154l_aa ALA   0.00   0.00  0.  68.85680
20 154l_aa LYS 113.58  81.72 -0.12914300  49.38620
 >

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subset and levels

2006-11-06 Thread Christos Hatzis
Hi Florent,

A simple example with the expected output would help.  If I understood
correctly what you want, perhaps the following would work:

X[ X$code %in% levels(Y$code), ]

assuming Y$code is a factor.  If not, you can used instead unique(Y$code) in
the above.

-Christos 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Florent Bresson
Sent: Monday, November 06, 2006 12:15 PM
To: R-help
Subject: [R] Subset and levels

Hi, I've got a very simple problem but cannot find the solution. I'm using
two data frames (say X and Y) and I want to get a subset of one according to
the different levels of a variable "code" of the other data frame. I tried
something like
Z<-subset(X, code==levels(Y$code))(1)

but it does not work. I do not want to do something like
Z<-subset(X,code==level1 | code==level2...)

because length(levels(Y$code)) is 130. So what's wrong with (1)?

Thanks

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subset and levels

2006-11-06 Thread Florent Bresson
Hi, I've got a very simple problem but cannot find the solution. I'm using two 
data frames (say X and Y) and I want to get a subset of one according to the 
different levels of a variable "code" of the other data frame. I tried 
something like
Z<-subset(X, code==levels(Y$code))(1)

but it does not work. I do not want to do something like
Z<-subset(X,code==level1 | code==level2...)

because length(levels(Y$code)) is 130. So what's wrong with (1)?

Thanks

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset by two variables

2006-08-29 Thread Marc Schwartz (via MN)
On Tue, 2006-08-29 at 13:05 -0700, Dean Sonneborn wrote:
> I'm using this syntax to create data subsets for plots:  
> subset=source=="Both". Now I would like to create a subset defined by 
> two different variables like this: subset=source=="Both"  
> subset=site=="home" but this syntax is not correct. The documents in the 
> manual for subset seem to be creating whole new data files not just 
> selecting rows based on the contents of the variables. How do I subset 
> based on two variables.

If your data frame is called 'DF':

  plot(y ~ x, data = DF, subset = (source == "Both") & (site == "home"))

See ?Logic and ?Syntax for more information on creating logical
expressions.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset by two variables

2006-08-29 Thread Dean Sonneborn
I'm using this syntax to create data subsets for plots:  
subset=source=="Both". Now I would like to create a subset defined by 
two different variables like this: subset=source=="Both"  
subset=site=="home" but this syntax is not correct. The documents in the 
manual for subset seem to be creating whole new data files not just 
selecting rows based on the contents of the variables. How do I subset 
based on two variables.

-- 
Dean Sonneborn, MS
Programmer Analyst
Department of Public Health Sciences
University of California, Davis
(530) 754-9516


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset of rows from matrix

2006-07-28 Thread jim holtman
actually the other way around:

x<-subset(largematrix, largematrix$rownames %in% rownames$names )


On 7/28/06, jim holtman <[EMAIL PROTECTED]> wrote:
>
>  try:
>
> x<-subset(largematrix, rownames$names %in% largematrix$rownames)
>
>
>
>  On 7/28/06, Wade Wall <[EMAIL PROTECTED]> wrote:
> >
> > Hi all,
> >
> > I have a dataframe of rownames that I would like to extract from a
> > larger matrix to form a new matrix. I have tried to use subset, in
> > this manner
> >
> > x<-subset(largematrix, rownames$names=largematrix$rownames)
> >
> > where largematrix is the larger matrix and rownames$names is the
> > dataframe with the row names of the rows I want to extract from the
> > larger matrix.
> >
> > Of course, I get error messages.  Any suggestions how I should proceed?
> >
> > Thanks
> >
> > Wade
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset of rows from matrix

2006-07-28 Thread jim holtman
try:

x<-subset(largematrix, rownames$names %in% largematrix$rownames)



On 7/28/06, Wade Wall <[EMAIL PROTECTED]> wrote:
>
> Hi all,
>
> I have a dataframe of rownames that I would like to extract from a
> larger matrix to form a new matrix. I have tried to use subset, in
> this manner
>
> x<-subset(largematrix, rownames$names=largematrix$rownames)
>
> where largematrix is the larger matrix and rownames$names is the
> dataframe with the row names of the rows I want to extract from the
> larger matrix.
>
> Of course, I get error messages.  Any suggestions how I should proceed?
>
> Thanks
>
> Wade
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset of rows from matrix

2006-07-28 Thread Wade Wall
Hi all,

I have a dataframe of rownames that I would like to extract from a
larger matrix to form a new matrix. I have tried to use subset, in
this manner

x<-subset(largematrix, rownames$names=largematrix$rownames)

where largematrix is the larger matrix and rownames$names is the
dataframe with the row names of the rows I want to extract from the
larger matrix.

Of course, I get error messages.  Any suggestions how I should proceed?

Thanks

Wade

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subset data in long format

2006-06-06 Thread Gabor Grothendieck
Try this:

subset(long, seq(id) - match(id,id) < 6)

On 6/6/06, Doran, Harold <[EMAIL PROTECTED]> wrote:
> I have data in a "long" format where each row is a student and each
> student occupies multiple rows with multiple observations. I need to
> subset these data based on a condition which I am having difficulty
> defining.
>
> The dataset I am working with is large, but here is a simple data
> structure to illustrate the issue
>
> tmp <- data.frame(id = 1:3, matrix(rnorm(30), ncol=10) )
> long <- reshape(tmp, idvar='id', varying=list(names(tmp)[2:11]),
> v.names=('item'),timevar='position' , direction='long')
> long <- long[order(long$id) , ]
> long <- long[c(-2,-13),]
>
> What I need to do is subset these data so I have the first 6 rows for
> each unique ID. The problem is that the data are unbalanced in that each
> ID has a different number of observations (which I why I removed obs 2
> and 13).
>
> If the data were balanced, the subset would be trivial and I could just
> do
>
> long <- subset(long, position < 7)
>
> However, the data are not balanced. Consequently, if I were to do this
> for the unbalanced data I would not have the first 6 obs for the first
> ID. I would only have the first 5. Theoretically, what I want for
> id1(and for each unique id) is this
>
> ID1 <- subset(long, id==1)
> ID1[1:6,]
>
> However, the goal is to subset the entire dataframe at once such that
> the subset returns a new dataframe with the first 6 rows for each unique
> id. Is there a feasible method for doing this subset that anyone can
> suggest? My actual dataset has more than 24,000 unique ids, so I am
> hoping to avoid looping through this if possible.
>
> Thanks,
> Harold
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Subset data in long format

2006-06-06 Thread Doran, Harold
Apologies, but there were some word wrap issues in the prior email it
seems. So, here is code for the sample data to avoid confusion 


tmp <- data.frame(id = 1:3, matrix(rnorm(30), ncol=10) )

long <- reshape(tmp, idvar='id', varying=list(names(tmp)[2:11]),
v.names=('item'),timevar='position' , direction='long')

long <- long[order(long$id) , ]

long <- long[c(-2,-13),]

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Doran, Harold
> Sent: Tuesday, June 06, 2006 5:08 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] Subset data in long format
> 
> I have data in a "long" format where each row is a student 
> and each student occupies multiple rows with multiple 
> observations. I need to subset these data based on a 
> condition which I am having difficulty defining. 
> 
> The dataset I am working with is large, but here is a simple 
> data structure to illustrate the issue
> 
> tmp <- data.frame(id = 1:3, matrix(rnorm(30), ncol=10) ) long 
> <- reshape(tmp, idvar='id', varying=list(names(tmp)[2:11]), 
> v.names=('item'),timevar='position' , direction='long') long 
> <- long[order(long$id) , ] long <- long[c(-2,-13),]
> 
> What I need to do is subset these data so I have the first 6 
> rows for each unique ID. The problem is that the data are 
> unbalanced in that each ID has a different number of 
> observations (which I why I removed obs 2 and 13).
> 
> If the data were balanced, the subset would be trivial and I 
> could just do
> 
> long <- subset(long, position < 7)
> 
> However, the data are not balanced. Consequently, if I were 
> to do this for the unbalanced data I would not have the first 
> 6 obs for the first ID. I would only have the first 5. 
> Theoretically, what I want for id1(and for each unique id) is this
> 
> ID1 <- subset(long, id==1)
> ID1[1:6,]
> 
> However, the goal is to subset the entire dataframe at once 
> such that the subset returns a new dataframe with the first 6 
> rows for each unique id. Is there a feasible method for doing 
> this subset that anyone can suggest? My actual dataset has 
> more than 24,000 unique ids, so I am hoping to avoid looping 
> through this if possible.
> 
> Thanks,
> Harold
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Subset data in long format

2006-06-06 Thread Doran, Harold
I have data in a "long" format where each row is a student and each
student occupies multiple rows with multiple observations. I need to
subset these data based on a condition which I am having difficulty
defining. 

The dataset I am working with is large, but here is a simple data
structure to illustrate the issue

tmp <- data.frame(id = 1:3, matrix(rnorm(30), ncol=10) )
long <- reshape(tmp, idvar='id', varying=list(names(tmp)[2:11]),
v.names=('item'),timevar='position' , direction='long')
long <- long[order(long$id) , ]
long <- long[c(-2,-13),]

What I need to do is subset these data so I have the first 6 rows for
each unique ID. The problem is that the data are unbalanced in that each
ID has a different number of observations (which I why I removed obs 2
and 13).

If the data were balanced, the subset would be trivial and I could just
do

long <- subset(long, position < 7)

However, the data are not balanced. Consequently, if I were to do this
for the unbalanced data I would not have the first 6 obs for the first
ID. I would only have the first 5. Theoretically, what I want for
id1(and for each unique id) is this

ID1 <- subset(long, id==1)
ID1[1:6,]

However, the goal is to subset the entire dataframe at once such that
the subset returns a new dataframe with the first 6 rows for each unique
id. Is there a feasible method for doing this subset that anyone can
suggest? My actual dataset has more than 24,000 unique ids, so I am
hoping to avoid looping through this if possible.

Thanks,
Harold


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Subset a list

2006-05-22 Thread Chuck Cleland
Doran, Harold wrote:
> I have a data frame of ~200 columns and ~20,000 rows where each column
> consists of binary responses (0,1) and a 9 for missing data. I am
> interested in finding the columns for which there are fewer than 100
> individuals with responses of 0. 
> 
> I can use an apply function to generate a table for each column, but I'm
> not certain whether I can subset a list based on some criterion as
> subset() is designed for vectors, matrices or dataframes.
> 
> For example, I can use the following:
> tt <- apply(data, 2, table)
> 
> Which returns an object of class list. Here is some sample output from
> tt
> 
> $R0235940b
> 
> 0 1 9 
>  2004  1076 15361 
> 
> $R710a
> 
> 0 9 
> 2 18439 
> 
> $R710b
> 
> 0 1 9 
>    3941 11167 
> 
> tt$R710a meets my criteria and I would want to be able to easily
> find this instead of rolling through the entire output. Is there a way
> to subset this list to identify the columns which meet the criteria I
> note above?

How about this?

newdf <- mydf[,colSums(mydf == 0) < 100]

> Thanks,
> Harold
> 
> 
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Subset a list

2006-05-22 Thread Marc Schwartz (via MN)
On Mon, 2006-05-22 at 17:55 -0400, Doran, Harold wrote:
> I have a data frame of ~200 columns and ~20,000 rows where each column
> consists of binary responses (0,1) and a 9 for missing data. I am
> interested in finding the columns for which there are fewer than 100
> individuals with responses of 0. 
> 
> I can use an apply function to generate a table for each column, but I'm
> not certain whether I can subset a list based on some criterion as
> subset() is designed for vectors, matrices or dataframes.
> 
> For example, I can use the following:
> tt <- apply(data, 2, table)
> 
> Which returns an object of class list. Here is some sample output from
> tt
> 
> $R0235940b
> 
> 0 1 9 
>  2004  1076 15361 
> 
> $R710a
> 
> 0 9 
> 2 18439 
> 
> $R710b
> 
> 0 1 9 
>    3941 11167 
> 
> tt$R710a meets my criteria and I would want to be able to easily
> find this instead of rolling through the entire output. Is there a way
> to subset this list to identify the columns which meet the criteria I
> note above?
> 
> 
> Thanks,
> Harold

Harold,

How about this:

> DF
   V1 V2 V3 V4 V5
1   0  1  0  1  0
2   0  0  1  0  1
3   0  0  1  1  0
4   1  1  0  0  1
5   1  1  1  1  0
6   0  1  0  1  1
7   0  1  1  1  0
8   0  1  0  0  0
9   0  0  1  1  0
10  1  0  0  1  1

# Find the columns with <5 0's
> which(sapply(DF, function(x) sum(x == 0)) < 5)
V2 V4
 2  4


So in your case, just replace the DF with your data frame name and the 5
with 100.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Subset a list

2006-05-22 Thread Doran, Harold
I have a data frame of ~200 columns and ~20,000 rows where each column
consists of binary responses (0,1) and a 9 for missing data. I am
interested in finding the columns for which there are fewer than 100
individuals with responses of 0. 

I can use an apply function to generate a table for each column, but I'm
not certain whether I can subset a list based on some criterion as
subset() is designed for vectors, matrices or dataframes.

For example, I can use the following:
tt <- apply(data, 2, table)

Which returns an object of class list. Here is some sample output from
tt

$R0235940b

0 1 9 
 2004  1076 15361 

$R710a

0 9 
2 18439 

$R710b

0 1 9 
   3941 11167 

tt$R710a meets my criteria and I would want to be able to easily
find this instead of rolling through the entire output. Is there a way
to subset this list to identify the columns which meet the criteria I
note above?


Thanks,
Harold




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset

2006-05-16 Thread Marc Schwartz (via MN)
On Tue, 2006-05-16 at 14:54 -0400, Guenther, Cameron wrote:
> Marc, 
> I have tried unique but unique looks at the entire row.  

Right, as I noted in the last line of my reply.

> I have a data
> set with a variable TRIPID.  The dataset has 469,000 rows.  In most
> cases TRIPID is a unique value.  However, in some cases I have the same
> TRIPID value but different values for other variables.  What this
> amounts to is an data entry error.  I need to get rid of the repeated
> rows that have the same TRIPID but different co-variables.  
> Thanks for your help.
> Cam 

If I am reading correctly, rather than retaining all unique rows, you
actually want to remove all rows with duplicated TRIPID values,
presuming that you don't know which row is correct.

In other words, if there are two rows with the same TRIPID value, you
want both rows removed?

I think what you want is this, presuming that 'x' is the data frame that
contains the column "TRIPID":

  NewDF <- subset(x, !TRIPID %in% TRIPID[duplicated(TRIPID)])

What is being done is to identify the actual values of TRIPID that are
duplicated (TRIPID[duplicated(TRIPID)]) and then subsetting 'x' by only
retaining rows of 'x' where the values of TRIPID are _not_ in the
duplicated values.

Check me on that though.

HTH,

Marc Schwartz

> On Tue, 2006-05-16 at 14:37 -0400, Guenther, Cameron wrote:
> > Hello everyone,
> > 
> > I have a large dataset (x) with some rows that have duplicate 
> > variables that I would like to remove.  I find which rows are the 
> > duplicates with X1<-which(duplicated(x)).  That gives me the rows with
> 
> > duplicated variables.  Now, how can I remove just those rose from the 
> > original data frame.  I think I can create a new data frame without 
> > the duplicates using subset.  I have tried:
> > Subset(x,!x1) and subset(x,!x[x1,])
> > I can't seem to find the correct syntax.  Any advice.
> > Thanks in advance
> 
> Even easier would be to use unique():
> 
>   NewDF < unique(x)
> 
> NewDF will contain rows from 'x' with duplicates removed.
> 
> See ?unique for more information.
> 
> unique(), which has a data.frame method, is basically:
> 
>   x[!duplicated(x), , drop = FALSE]
> 
> which covers the case where the result may contain a single row and
> which remains a data frame.
> 
> Note that the above presumes that you want to test all columns in 'x'
> for dups.
> 
> HTH,
> 
> Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset

2006-05-16 Thread Guenther, Cameron
 
Thanks Phil
That worked pergectly.

Cameron Guenther, Ph.D. 
Associate Research Scientist
FWC/FWRI, Marine Fisheries Research
100 8th Avenue S.E.
St. Petersburg, FL 33701
(727)896-8626 Ext. 4305
[EMAIL PROTECTED]
-Original Message-
From: Phil Spector [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 16, 2006 3:01 PM
To: Guenther, Cameron
Subject: Re: [R] subset

Cameron -
Is

  X1 = which(duplicated(x))
  x[-X1,]

or
  x[!duplicated(x),]

or
  subset(x,!duplicated(x))

what you're looking for?  Remember that which() will always return
indices, so negating them (with regards to subscripts) means making them
negative, not applying the not operator(!).  The not operator can only
be applied to logical values, like those returned by duplicated()

   - Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 [EMAIL PROTECTED]

On Tue, 16 May 2006, Guenther, Cameron wrote:

> Hello everyone,
>
> I have a large dataset (x) with some rows that have duplicate 
> variables that I would like to remove.  I find which rows are the 
> duplicates with X1<-which(duplicated(x)).  That gives me the rows with

> duplicated variables.  Now, how can I remove just those rose from the 
> original data frame.  I think I can create a new data frame without 
> the duplicates using subset.  I have tried:
> Subset(x,!x1) and subset(x,!x[x1,])
> I can't seem to find the correct syntax.  Any advice.
> Thanks in advance
>
> Cameron Guenther, Ph.D.
> Associate Research Scientist
> FWC/FWRI, Marine Fisheries Research
> 100 8th Avenue S.E.
> St. Petersburg, FL 33701
> (727)896-8626 Ext. 4305
> [EMAIL PROTECTED]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset

2006-05-16 Thread Guenther, Cameron
Marc, 
I have tried unique but unique looks at the entire row.  I have a data
set with a variable TRIPID.  The dataset has 469,000 rows.  In most
cases TRIPID is a unique value.  However, in some cases I have the same
TRIPID value but different values for other variables.  What this
amounts to is an data entry error.  I need to get rid of the repeated
rows that have the same TRIPID but different co-variables.  
Thanks for your help.
Cam 


Cameron Guenther, Ph.D. 
Associate Research Scientist
FWC/FWRI, Marine Fisheries Research
100 8th Avenue S.E.
St. Petersburg, FL 33701
(727)896-8626 Ext. 4305
[EMAIL PROTECTED]
-Original Message-
From: Marc Schwartz (via MN) [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 16, 2006 2:50 PM
To: Guenther, Cameron
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] subset

On Tue, 2006-05-16 at 14:37 -0400, Guenther, Cameron wrote:
> Hello everyone,
> 
> I have a large dataset (x) with some rows that have duplicate 
> variables that I would like to remove.  I find which rows are the 
> duplicates with X1<-which(duplicated(x)).  That gives me the rows with

> duplicated variables.  Now, how can I remove just those rose from the 
> original data frame.  I think I can create a new data frame without 
> the duplicates using subset.  I have tried:
> Subset(x,!x1) and subset(x,!x[x1,])
> I can't seem to find the correct syntax.  Any advice.
> Thanks in advance

Even easier would be to use unique():

  NewDF < unique(x)

NewDF will contain rows from 'x' with duplicates removed.

See ?unique for more information.

unique(), which has a data.frame method, is basically:

  x[!duplicated(x), , drop = FALSE]

which covers the case where the result may contain a single row and
which remains a data frame.

Note that the above presumes that you want to test all columns in 'x'
for dups.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset

2006-05-16 Thread Marc Schwartz (via MN)
On Tue, 2006-05-16 at 14:37 -0400, Guenther, Cameron wrote:
> Hello everyone,
> 
> I have a large dataset (x) with some rows that have duplicate variables
> that I would like to remove.  I find which rows are the duplicates with
> X1<-which(duplicated(x)).  That gives me the rows with duplicated
> variables.  Now, how can I remove just those rose from the original data
> frame.  I think I can create a new data frame without the duplicates
> using subset.  I have tried:
> Subset(x,!x1) and subset(x,!x[x1,])
> I can't seem to find the correct syntax.  Any advice.
> Thanks in advance

Even easier would be to use unique():

  NewDF < unique(x)

NewDF will contain rows from 'x' with duplicates removed.

See ?unique for more information.

unique(), which has a data.frame method, is basically:

  x[!duplicated(x), , drop = FALSE]

which covers the case where the result may contain a single row and
which remains a data frame.

Note that the above presumes that you want to test all columns in 'x'
for dups.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] subset

2006-05-16 Thread Guenther, Cameron
Hello everyone,

I have a large dataset (x) with some rows that have duplicate variables
that I would like to remove.  I find which rows are the duplicates with
X1<-which(duplicated(x)).  That gives me the rows with duplicated
variables.  Now, how can I remove just those rose from the original data
frame.  I think I can create a new data frame without the duplicates
using subset.  I have tried:
Subset(x,!x1) and subset(x,!x[x1,])
I can't seem to find the correct syntax.  Any advice.
Thanks in advance

Cameron Guenther, Ph.D. 
Associate Research Scientist
FWC/FWRI, Marine Fisheries Research
100 8th Avenue S.E.
St. Petersburg, FL 33701
(727)896-8626 Ext. 4305
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Subset dataframe based on condition

2006-04-18 Thread Sachin J
Thanx Steve and Tony for your help.
   
  Sachin

Tony Plate <[EMAIL PROTECTED]> wrote:
  Works OK for me:

> x <- data.frame(a=10^(-2:7), b=10^(10:1))
> subset(x, a > 1)
a b
4 1e+01 1e+07
5 1e+02 1e+06
6 1e+03 1e+05
7 1e+04 1e+04
8 1e+05 1e+03
9 1e+06 1e+02
10 1e+07 1e+01
> subset(x, a > 1 & b < a)
a b
8 1e+05 1000
9 1e+06 100
10 1e+07 10
>

Do you get all "numeric" for the following?

> sapply(x, class)
a b
"numeric" "numeric"
>

If not, then your data frame is probably encoding the information in 
some way that you don't want (though if it was as factors, I would have 
expected a warning from the comparison operator).

You might get more help by distilling your problem to a simple example 
that can be tried out by others.

-- Tony Plate

Sachin J wrote:
> Hi,
> 
> I am trying to extract subset of data from my original data frame 
> based on some condition. For example : (mydf -original data frame, submydf 
> - subset dada frame)
> 
> >submydf = subset(mydf, a > 1 & b <= a), 
> 
> here column a contains values ranging from 0.01 to 10. I want to 
> extract only those matching condition 1 i.e a > . But when i execute 
> this command it is not giving me appropriate result. The subset df - 
> submydf contains rows with 0.01 also. Please help me to resolve this 
> problem.
> 
> Thanks in advance.
> 
> Sachin
> 
> 
> -
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 




-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Subset dataframe based on condition

2006-04-17 Thread Tony Plate
Works OK for me:

 > x <- data.frame(a=10^(-2:7), b=10^(10:1))
 > subset(x, a > 1)
a b
4  1e+01 1e+07
5  1e+02 1e+06
6  1e+03 1e+05
7  1e+04 1e+04
8  1e+05 1e+03
9  1e+06 1e+02
10 1e+07 1e+01
 > subset(x, a > 1 & b < a)
ab
8  1e+05 1000
9  1e+06  100
10 1e+07   10
 >

Do you get all "numeric" for the following?

 > sapply(x, class)
 a b
"numeric" "numeric"
 >

If not, then your data frame is probably encoding the information in 
some way that you don't want (though if it was as factors, I would have 
expected a warning from the comparison operator).

You might get more help by distilling your problem to a simple example 
that can be tried out by others.

-- Tony Plate

Sachin J wrote:
> Hi,
>
>   I am trying to extract subset of data from my original data frame 
> based on some condition. For example : (mydf -original data frame, submydf 
> - subset dada frame)
>
>   >submydf = subset(mydf, a > 1 & b <= a), 
>
>   here column a contains values ranging from 0.01 to 10. I want to 
> extract only those matching condition 1 i.e a > . But when i execute 
> this command it is not giving me appropriate result. The subset df - 
> submydf  contains rows with 0.01 also. Please help me to resolve this 
> problem.
>
>   Thanks in advance.
>
>   Sachin
> 
>   
> -
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Subset dataframe based on condition

2006-04-17 Thread Steve Miller
How about trying a nested subset:

submydf = subset(subset(mydf, a > 1),b <= a)

Steve Miller

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Sachin J
Sent: Monday, April 17, 2006 10:38 AM
To: R-help@stat.math.ethz.ch
Subject: [R] Subset dataframe based on condition

Hi,
   
  I am trying to extract subset of data from my original data frame based on
some condition. For example : (mydf -original data frame, submydf - subset
dada frame)
   
  >submydf = subset(mydf, a > 1 & b <= a), 
   
  here column a contains values ranging from 0.01 to 10. I want to
extract only those matching condition 1 i.e a > . But when i execute this
command it is not giving me appropriate result. The subset df - submydf
contains rows with 0.01 also. Please help me to resolve this problem.
   
  Thanks in advance.
   
  Sachin


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Subset dataframe based on condition

2006-04-17 Thread Sachin J
Hi,
   
  I am trying to extract subset of data from my original data frame 
based on some condition. For example : (mydf -original data frame, submydf 
- subset dada frame)
   
  >submydf = subset(mydf, a > 1 & b <= a), 
   
  here column a contains values ranging from 0.01 to 10. I want to 
extract only those matching condition 1 i.e a > . But when i execute 
this command it is not giving me appropriate result. The subset df - 
submydf  contains rows with 0.01 also. Please help me to resolve this 
problem.
   
  Thanks in advance.
   
  Sachin


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Subset dataframe based on condition

2006-04-17 Thread Sachin J
Hi,
   
  I am trying to extract subset of data from my original data frame based on 
some condition. For example : (mydf -original data frame, submydf - subset dada 
frame)
   
  >submydf = subset(mydf, a > 1 & b <= a), 
   
  here column a contains values ranging from 0.01 to 10. I want to extract 
only those matching condition 1 i.e a > . But when i execute this command it is 
not giving me appropriate result. The subset df - submydf  contains rows with 
0.01 also. Please help me to resolve this problem.
   
  Thanks in advance.
   
  Sachin


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Subset rows over multiple columns

2006-04-13 Thread Gabor Grothendieck
Try this:

tt2 <- tt
tt2[,1] <- as.character(tt2[,1])
tt2[,2] <- as.character(tt2[,2])

f <- function(x) with(tt2, mean(righta_a[x == itd_1 | x == itd_45]))
sapply(unique(unlist(tt2[,1:2])), f)


On 4/13/06, Doran, Harold <[EMAIL PROTECTED]> wrote:
> I have a data frame where I need to subset certain rows before I compute
> the mean of another variable. However, the value that I need to subset
> by is found in multiple columns. For example, in the data below the
> value R160 is found in the first and second columns (itd_1 and
> itd_45).  These data are student responses to multiple choice test items
> from a computer adaptive test. So, the variable itd_1 denotes that item
> i was presented to student k in position t and then the variable
> righta_a and righta_b denotes a correct (1) or incorrect response to
> that item when it was presented.
>
> My goal is to get the p-value (mean of the binary variable) for each
> item irrespective of when it was presented to the student.
>
> So, in the sample case below, I would use all elements in righta_a
> (except for the second to last) and then only the second to last value
> in righta_b.
>
> > tail(tt)
> itd_1   itd_45 righta_a righta_b
> 18407 R160 R020847010
> 18412 R160 R023814001
> 18417 R160 R025969011
> 18422 R160 R73011
> 18450 R0113750 R16011
> 18456 R160 R023869001
>
> One thing I can envision doing is using the reshape option such that
> itd_1 and itd_45 would be in the "long" format. This would cause for
> itd_1 and itd_45 to be stacked in a single column as well as righta_a
> and righta_b and then I could then use tapply and get what I need
> without any subsetting. That is
>
> testScores <- reshape(tt, idvar='id', varying=list(c('itd_1', 'itd_45'),
> c('righta_a', 'righta_b')), v.names=c('item','answer'),
> timevar='item_position', direction='long')
>
> with(testScores, tapply(answer, item, mean))
>
> Or I could get
>
> with(testScores, tapply(answer, list(item, position), mean))
>
> The only problem here is that I have some duplicate IDs in the data and
> reshape doesn't like turning data on its head in that situation, so I
> would need to tinker with those first.
>
> So, I have what I think would be a solution, I wonder if there is
> another way to preserve the data in this "wide" format and get the
> estimates I need? Maybe it is just easier to use reshape. Any
> suggestions?
>
> Harold
> Windows Xp
> R 2.2.1
>
>[[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Subset rows over multiple columns

2006-04-13 Thread Doran, Harold
I have a data frame where I need to subset certain rows before I compute
the mean of another variable. However, the value that I need to subset
by is found in multiple columns. For example, in the data below the
value R160 is found in the first and second columns (itd_1 and
itd_45).  These data are student responses to multiple choice test items
from a computer adaptive test. So, the variable itd_1 denotes that item
i was presented to student k in position t and then the variable
righta_a and righta_b denotes a correct (1) or incorrect response to
that item when it was presented.

My goal is to get the p-value (mean of the binary variable) for each
item irrespective of when it was presented to the student.

So, in the sample case below, I would use all elements in righta_a
(except for the second to last) and then only the second to last value
in righta_b.

> tail(tt)
 itd_1   itd_45 righta_a righta_b
18407 R160 R020847010
18412 R160 R023814001
18417 R160 R025969011
18422 R160 R73011
18450 R0113750 R16011
18456 R160 R023869001

One thing I can envision doing is using the reshape option such that
itd_1 and itd_45 would be in the "long" format. This would cause for
itd_1 and itd_45 to be stacked in a single column as well as righta_a
and righta_b and then I could then use tapply and get what I need
without any subsetting. That is

testScores <- reshape(tt, idvar='id', varying=list(c('itd_1', 'itd_45'),
c('righta_a', 'righta_b')), v.names=c('item','answer'),
timevar='item_position', direction='long')

with(testScores, tapply(answer, item, mean))

Or I could get

with(testScores, tapply(answer, list(item, position), mean))

The only problem here is that I have some duplicate IDs in the data and
reshape doesn't like turning data on its head in that situation, so I
would need to tinker with those first. 

So, I have what I think would be a solution, I wonder if there is
another way to preserve the data in this "wide" format and get the
estimates I need? Maybe it is just easier to use reshape. Any
suggestions?

Harold
Windows Xp
R 2.2.1

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset a matrix

2006-04-12 Thread ronggui
> x=matrix(rnorm(20*30),20)
> y=x[seq(1,20,by=2),seq(1,30,by=2)]



2006/4/12, zhijie zhang <[EMAIL PROTECTED]>:
> Dear friends,
>  I have a (20*30) matrix,and want to get a subset of it like the following:
> The original matrix: rows:1,2,3,20; columns:1,2,3,30
> I want to get my subset of The original matrix and delete others:
>rows:1,3,5,7,...19;   columns:1,3,5.29
>
>
> --
> Kind Regards,Zhi Jie,Zhang ,PHDDepartment of EpidemiologySchool of Public
> HealthFudan UniversityTel:86-21-54237149
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>


--
黄荣贵
Deparment of Sociology
Fudan University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] subset a matrix

2006-04-12 Thread Peter Ehlers
Assuming that you want odd-numbered rows/cols, the
seq() function is handy, as in:

mat <- matrix(1:(20*30), nr = 20)
mat1 <- mat[seq(1, 19, by = 2), seq(1, 29, by = 2)]

Peter Ehlers

zhijie zhang wrote:

> Dear friends,
>  I have a (20*30) matrix,and want to get a subset of it like the following:
> The original matrix: rows:1,2,3,20; columns:1,2,3,30
> I want to get my subset of The original matrix and delete others:
>rows:1,3,5,7,...19;   columns:1,3,5.29
> 
> 
> --
> Kind Regards,Zhi Jie,Zhang ,PHDDepartment of EpidemiologySchool of Public
> HealthFudan UniversityTel:86-21-54237149
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] subset a matrix

2006-04-12 Thread zhijie zhang
Dear friends,
 I have a (20*30) matrix,and want to get a subset of it like the following:
The original matrix: rows:1,2,3,20; columns:1,2,3,30
I want to get my subset of The original matrix and delete others:
   rows:1,3,5,7,...19;   columns:1,3,5.29


--
Kind Regards,Zhi Jie,Zhang ,PHDDepartment of EpidemiologySchool of Public
HealthFudan UniversityTel:86-21-54237149

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset problem

2006-02-22 Thread Liaw, Andy
You need year == 2002.

Andy

From:  I.Szentirmai
> 
> Dear All,
> 
> I'm trying to run a model on a subset of my data 
> identified by year = 2002. Does anyone know whats wrong 
> with the syntax below:
> 
> glmmPQL(desm~desdat,random=~1|male,family=quasibinomial,
> data=mcare,subset=year=2002)
> 
> I get an error message all the time, but it worked with 
> string variables (like: subset=sex=="M").
> 
> Thanks,
> Istvan
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] subset problem

2006-02-22 Thread I.Szentirmai
Dear All,

I'm trying to run a model on a subset of my data 
identified by year = 2002. Does anyone know whats wrong 
with the syntax below:

glmmPQL(desm~desdat,random=~1|male,family=quasibinomial,
data=mcare,subset=year=2002)

I get an error message all the time, but it worked with 
string variables (like: subset=sex=="M").

Thanks,
Istvan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset selection for glm

2005-10-15 Thread Prof Brian Ripley
On Sat, 15 Oct 2005, Dhiren DSouza wrote:

> I posted a message earlier about subset selection.
>
> I have a data set with 50 variables x1, x2,  x50
>
> x50 is a binary response variable that I would like to predict.  Is there a
> library I could use to do an exhaustive search for a subset
> (forward/backward subset selection) of variables to include in the
> regression model.  Any help would be greatly appreciated.

?step  (as surely help.search() would have shown you), and btw, that is 
not an `exhaustive search' procedure.

Frank Harrell has posted repeatedly on the dangers of unthinking use of 
such a procedure -- if he does not chime in now, please do look at his 
posts (and if you have access to it, his book).  You have not told us 
*why* you want to do variable selection (which is a more accurate name for 
what you are calling `subset' selection), and for most purposes it is not 
a good idea.


Let me second Roger Bivand's comment earlier today:

> I would, though, appeal to posters to give those who try to reply to
> questions at least a little help, by including an informative signature
> block.

I know that several helpers are quite unlikely to offer help to someone 
sending an unsigned letter, for that is what not using a real user name 
and affiliation amounts to.  So, PLEASE give your credentials -- this 
forum is a free (to the recipients) technical support forum, and that is a 
privilege that should be respected.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] subset selection for glm

2005-10-15 Thread Dhiren DSouza
I posted a message earlier about subset selection.

I have a data set with 50 variables x1, x2,  x50

x50 is a binary response variable that I would like to predict.  Is there a 
library I could use to do an exhaustive search for a subset 
(forward/backward subset selection) of variables to include in the 
regression model.  Any help would be greatly appreciated.

-D

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset selection for glm

2005-10-15 Thread Thomas Schönhoff
Hello Dhiren,

2005/10/15, Dhiren DSouza <[EMAIL PROTECTED]>:
> Hello:
>
> Are there any libraries that will do a subset selection for glm's?  I looked
> through leaps, but seems like it is specifically for linear regressions.


?subset should tell you. AFAIK, subset function is not depend on a
special statistical procedure, but on types of datasets: vectors,
matrices or data frames, as the related help page says.
>From ?glm help page:

 All of 'weights', 'subset', 'offset', 'etastart' and 'mustart' are
 evaluated in the same way as variables in 'formula', that is first
 in 'data' and then in the environment of 'formula'



If you want a more secific answer to your question it would be very
helpful to see a toy example of yours.

The posting guide (at the bottom of every mail ) is very helpful to
set up clear questions which likely increase your chance to get more
helpful responses from the list.

Thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] subset selection for glm

2005-10-14 Thread Dhiren DSouza
Hello:

Are there any libraries that will do a subset selection for glm's?  I looked 
through leaps, but seems like it is specifically for linear regressions.  
Thank you.

-Dhiren

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset-analogue removing fed in indexes?

2005-05-31 Thread Johannes Graumann
Jebus, you guys are fast!

Thanks so much,

Joh

On Tue, 2005-05-31 at 17:38 -0400, Gabor Grothendieck wrote:
> On 5/31/05, Johannes Graumann <[EMAIL PROTECTED]> wrote:
> > Hello,
> > 
> > Here's my issue:
> > 
> > I want to plot the following vectors:
> > > x <- c(0.0, 2.0, 15.0, 100.0, 105.0, 105.1, 110.0, 120.0, 120.1,
> > 130.0)
> > > data <- c(8.75, 8.75, 16.25, 38.75, 61.25, 8.75, NA, 8.75, NA, NA)
> > 
> > and avoid the line discontinuations caused by 'NA'.
> > > plot_data <- na.omit(data)
> > will clean up 'data' for me, but now I need to get a 'plot_x' which
> > omits the values indexed with what's spit out by 'na.exclude(data)'.
> > 
> > Can anybody let me know a smooth way of how to delete entries with
> > certain indexes from a vector?
> 
> plot(approx(x,data))
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset-analogue removing fed in indexes?

2005-05-31 Thread Gabor Grothendieck
On 5/31/05, Johannes Graumann <[EMAIL PROTECTED]> wrote:
> Hello,
> 
> Here's my issue:
> 
> I want to plot the following vectors:
> > x <- c(0.0, 2.0, 15.0, 100.0, 105.0, 105.1, 110.0, 120.0, 120.1,
> 130.0)
> > data <- c(8.75, 8.75, 16.25, 38.75, 61.25, 8.75, NA, 8.75, NA, NA)
> 
> and avoid the line discontinuations caused by 'NA'.
> > plot_data <- na.omit(data)
> will clean up 'data' for me, but now I need to get a 'plot_x' which
> omits the values indexed with what's spit out by 'na.exclude(data)'.
> 
> Can anybody let me know a smooth way of how to delete entries with
> certain indexes from a vector?

plot(approx(x,data))

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] subset-analogue removing fed in indexes?

2005-05-31 Thread Johannes Graumann
Hello,

Here's my issue:

I want to plot the following vectors:
> x <- c(0.0, 2.0, 15.0, 100.0, 105.0, 105.1, 110.0, 120.0, 120.1,
130.0)
> data <- c(8.75, 8.75, 16.25, 38.75, 61.25, 8.75, NA, 8.75, NA, NA)

and avoid the line discontinuations caused by 'NA'.
> plot_data <- na.omit(data)
will clean up 'data' for me, but now I need to get a 'plot_x' which
omits the values indexed with what's spit out by 'na.exclude(data)'.

Can anybody let me know a smooth way of how to delete entries with
certain indexes from a vector?

Thanks, Joih

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Subset with selection variable from function argument. Is there another way?

2005-05-11 Thread Prof Brian Ripley
On Wed, 11 May 2005, Fredrik Karlsson wrote:
Dear list,
I'm making my current code more generic and would like some advise.
The basic problem is subset and the name of the column to be compared
for selection.
What I've come up with is
data(mammals)
set <- bottompremolars"
subset(mammals, eval(parse(file="",text=set)) > 2)
This seems a bit odd.  Is there a nicer way?
Try
set <- "bottompremolars"
mammals[[set]]
assuming it is a data frame and you just want the one column.
subset() is just a convenience wrapper for the basic indexing operations, 
which are well worth learning.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Subset with selection variable from function argument. Is there another way?

2005-05-11 Thread Uwe Ligges
Fredrik Karlsson wrote:
Dear list,
I'm making my current code more generic and would like some advise.
The basic problem is subset and the name of the column to be compared
for selection.
What I've come up with is 


data(mammals)
set <- bottompremolars"
The line above is
 a) syntactically incorrect and
 b) the string does not describe a variable in the mammals data
hence this is not reproducible at all.
subset(mammals, eval(parse(file="",text=set)) > 2)

Let's assume
  set <- "body"
Either use get() as in
  subset(mammals, get(set) > 2)
or simple indexing such as:
  subset(mammals, mammals[[set]] > 2)
Uwe Ligges

This seems a bit odd.  Is there a nicer way?
/Fredrik
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Subset with selection variable from function argument. Is there another way?

2005-05-11 Thread Fredrik Karlsson
Dear list,

I'm making my current code more generic and would like some advise.
The basic problem is subset and the name of the column to be compared
for selection.

What I've come up with is 

> data(mammals)
> set <- bottompremolars"
> subset(mammals, eval(parse(file="",text=set)) > 2)

This seems a bit odd.  Is there a nicer way?

/Fredrik

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset arg lmList

2005-04-08 Thread Prof Brian Ripley
On Fri, 8 Apr 2005, Sebastian Luque wrote:
I'm having trouble understanding how functions in the subset argument for
lmList search for the objects they need. This trivial example produces
"Error in rownames(fakedf) : Object "fakedf" not found":
library(nlme)
fitbyID <- function() {
 fakedf <- data.frame(ID = gl(5, 10, 50),
  A = sample(1:100, 50),
  B = rnorm(50))
 mycoefs <- lmList(B ~ A | ID,
   data = fakedf,
   subset = !is.element(rownames(fakedf),
c("3", "10")))
 coef(mycoefs)
}
fitbyID()
If fakedf is already in the workspace, then the function runs fine, so
rownames seems to be looking for it in the global environment, although
I'd expect it to search locally first. I suspect this shows some gaps in
my understanding of environments and related concepts. I'd be grateful for
some advice on this.
That's not how several functions in nlme were written (I have mentioned it 
to the authors in the past).  lmList.formula contains

if (!missing(subset)) {
data <- data[eval(asOneSidedFormula(Call[["subset"]])[[2]],
data), , drop = FALSE]
}
So that evaluates 'subset' first in data and then in the body of the 
lmList.  As in S/R the parent frames are not in the scope for that 
evaluation, it does not look in the body of your function 'fitbyID'.

Functions using the standard paradigm (such as lm) do arrange to do the 
evaluation in the parent, but that can cause problems if nesting goes 
deeper (as e.g. in step()).  Things were complicated by the change around 
1.2.0 to (in the standard paradigm) look in the environment of the formula
(not done here).

The simplest workaround is to assign 'fakedf' with some innocuous name 
(usually beginning with a dot) in the workspace.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] subset arg lmList

2005-04-08 Thread Sebastian Luque
I'm having trouble understanding how functions in the subset argument for
lmList search for the objects they need. This trivial example produces
"Error in rownames(fakedf) : Object "fakedf" not found":

library(nlme)

fitbyID <- function() {
  fakedf <- data.frame(ID = gl(5, 10, 50),
   A = sample(1:100, 50),
   B = rnorm(50))
  mycoefs <- lmList(B ~ A | ID,
data = fakedf,
subset = !is.element(rownames(fakedf),
 c("3", "10")))
  coef(mycoefs)
}

fitbyID()


If fakedf is already in the workspace, then the function runs fine, so
rownames seems to be looking for it in the global environment, although
I'd expect it to search locally first. I suspect this shows some gaps in
my understanding of environments and related concepts. I'd be grateful for
some advice on this.

Best wishes,
-- 
Sebastian P. Luque

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset selection for logistic regression

2005-03-02 Thread Frank E Harrell Jr
Christian Hennig wrote:
Perhaps I should not write it because I will discredit myself with this
but...
Suppose I have a setup with 100 variables and some 1000 cases and I want to
boil down the number of variables to a maximum of 10 for practical reasons
even if I lose 10% prediction quality by this (for example because it is
expensive to measure all variables on new cases).  

Is it really so wrong to use a stepwise method?
Yes.  Read about model uncertainty and bias in models developed using 
stepwise methods.  One exception: if there is a large number of 
variables with truly zero regression coefficients, and the rest are not 
very weak, stepwise can sort things out fairly well.  But you never know 
this in advance.

Let's say I divide the sample into three parts and do variable selction on
the first part, estimation on the second and test on the third part (this
solves almost all problems Frank is talking about on p. 56/57 in his
excellent book). Is there always a tractable alternative? 
That's a good way to find out how bad the method is, not to fix the 
problems inherent in it.

Of course it is wrong to interpret the selected variables as "the true
influences" and all others as "unrelated", but if I don't do that?
If it should really be a taboo to do stepwise variable selection, why are p.
58/59 of "Regression Modeling Strategies" devoted to "how to do it of you
must"?
Stress on "if".  And note that if you ask what is the optimum alpha for 
variables to be kept in the model when doing backwards stepdown, it's 
alpha=1.0.  A good compromise is alpha=0.5.  See

@Article{ste01pro,
  author = 		 {Steyerberg, Ewout W. and Eijkemans, Marinus
  J. C. and Harrell, Frank E. and Habbema, J. Dik F.},
  title = 		 {Prognostic modeling with logistic regression
  analysis: {In} search of a sensible strategy in small data sets},
  journal = 	 Medical Decision Making,
  year = 		 2001,
  volume =		 21,
  pages =		 {45-56},
  annote =		 {shrinkage; variable selection; dichotomization of
  continuous varibles; sign of regression coefficient; calibration; 
validation}
}

And on Bert's excellent question about why shrinkage is not used more 
often, here is our attempt at a remedy:

@Article{moo04pen,
  author = 		 {Moons, K. G. M. and Donders, A. Rogier T. and
Steyerberg, E. W. and Harrell, F. E.},
  title = 		 {Penalized maximum likelihood estimation to directly
adjust diagnostic and prognostic prediction models for overoptimism: a
clinical example},
  journal = 	 J Clinical Epidemiology,
  year = 		 2004,
  volume =		 57,
  pages =		 {1262-1270},
  annote =		 {prediction 
research;overoptimism;overfitting;penalization;bootstrapping;shrinkage}
}

Frank

Please forget my name;-)
Christian
On Wed, 2 Mar 2005, Berton Gunter wrote:

To clarify Frank's remark ...
A prominent theme in statistical research over at least the last 25 years
(with roots that go back 50 or more, probably) has been the superiority of
"shrinkage" methods over variable selection. I also find it distressing that
these ideas have apparently not penetrated much (at all?) into the wider
scientific community (but I suppose I shouldn't be surprised -- most
scientists still do one factor at a time experiments 80 years after Fisher).
Specific incarnations can be found in anything Bayesian, mixed effects
models for repeated measures, ridge regression, and the R packages lars and
lasso, among others.
I would speculate that aside from the usual statistics/science cultural
issues, part of the reason for this is that the estimators don't generally
come with neat, classical inference procedures: like it or not, many
scientists have been conditioned by their Stat 101 courses to expect P
values, so in some sense, we are hoisted by our own petard.
Just my $.02 -- contrary(and more knowledgeable) opinions welcome.
-- Bert Gunter

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Frank 
E Harrell Jr
Sent: Wednesday, March 02, 2005 5:13 AM
To: Wittner, Ben
Cc: [EMAIL PROTECTED]
Subject: Re: [R] subset selection for logistic regression

Wittner, Ben wrote:
R-packages leaps and subselect implement various methods of 
selecting best or
good subsets of predictor variables for linear regression 
models, but they do
not seem to be applicable to logistic regression models.
Does anyone know of software for finding good subsets of 
predictor variables for
linear regression models?
Thanks.
-Ben
Why are these procedures still being used?  The performance 
is known to 
be bad in almost every sense (see r-help archives).

Frank Harrell

p.s., The leaps package references "Subset Selection in 
Regression" by Alan
Miller. On page 2 of the
2nd edition of that text it states the following:
 "All of the models which will be considered in this 
monograph will be linear;
that is they
  will be linear in the regression coefficients.Th

RE: [R] subset selection for logistic regression

2005-03-02 Thread Christian Hennig
Perhaps I should not write it because I will discredit myself with this
but...

Suppose I have a setup with 100 variables and some 1000 cases and I want to
boil down the number of variables to a maximum of 10 for practical reasons
even if I lose 10% prediction quality by this (for example because it is
expensive to measure all variables on new cases).  

Is it really so wrong to use a stepwise method?
Let's say I divide the sample into three parts and do variable selction on
the first part, estimation on the second and test on the third part (this
solves almost all problems Frank is talking about on p. 56/57 in his
excellent book). Is there always a tractable alternative? 

Of course it is wrong to interpret the selected variables as "the true
influences" and all others as "unrelated", but if I don't do that?

If it should really be a taboo to do stepwise variable selection, why are p.
58/59 of "Regression Modeling Strategies" devoted to "how to do it of you
must"?

Please forget my name;-)

Christian

On Wed, 2 Mar 2005, Berton Gunter wrote:

> To clarify Frank's remark ...
> 
> A prominent theme in statistical research over at least the last 25 years
> (with roots that go back 50 or more, probably) has been the superiority of
> "shrinkage" methods over variable selection. I also find it distressing that
> these ideas have apparently not penetrated much (at all?) into the wider
> scientific community (but I suppose I shouldn't be surprised -- most
> scientists still do one factor at a time experiments 80 years after Fisher).
> Specific incarnations can be found in anything Bayesian, mixed effects
> models for repeated measures, ridge regression, and the R packages lars and
> lasso, among others.
> 
> I would speculate that aside from the usual statistics/science cultural
> issues, part of the reason for this is that the estimators don't generally
> come with neat, classical inference procedures: like it or not, many
> scientists have been conditioned by their Stat 101 courses to expect P
> values, so in some sense, we are hoisted by our own petard.
> 
> Just my $.02 -- contrary(and more knowledgeable) opinions welcome.
> 
> -- Bert Gunter
>  
> 
> > -Original Message-
> > From: [EMAIL PROTECTED] 
> > [mailto:[EMAIL PROTECTED] On Behalf Of Frank 
> > E Harrell Jr
> > Sent: Wednesday, March 02, 2005 5:13 AM
> > To: Wittner, Ben
> > Cc: [EMAIL PROTECTED]
> > Subject: Re: [R] subset selection for logistic regression
> > 
> > Wittner, Ben wrote:
> > > R-packages leaps and subselect implement various methods of 
> > selecting best or
> > > good subsets of predictor variables for linear regression 
> > models, but they do
> > > not seem to be applicable to logistic regression models.
> > >  
> > > Does anyone know of software for finding good subsets of 
> > predictor variables for
> > > linear regression models?
> > >  
> > > Thanks.
> > >  
> > > -Ben
> > 
> > Why are these procedures still being used?  The performance 
> > is known to 
> > be bad in almost every sense (see r-help archives).
> > 
> > Frank Harrell
> > 
> > >  
> > > p.s., The leaps package references "Subset Selection in 
> > Regression" by Alan
> > > Miller. On page 2 of the
> > > 2nd edition of that text it states the following:
> > >  
> > >   "All of the models which will be considered in this 
> > monograph will be linear;
> > > that is they
> > >will be linear in the regression coefficients.Though 
> > most of the ideas and
> > > problems carry
> > >over to the fitting of nonlinear models and generalized 
> > linear models
> > > (particularly the fitting
> > >of logistic relationships), the complexity is greatly increased."
> > 
> > 
> > -- 
> > Frank E Harrell Jr   Professor and Chair   School of Medicine
> >   Department of Biostatistics   
> > Vanderbilt University
> > 
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> >
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

***
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
>From 1 April 2005: Department of Statistical Science, UCL, London
###
ich empfehle www.boag-online.de

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] subset selection for logistic regression

2005-03-02 Thread Berton Gunter
To clarify Frank's remark ...

A prominent theme in statistical research over at least the last 25 years
(with roots that go back 50 or more, probably) has been the superiority of
"shrinkage" methods over variable selection. I also find it distressing that
these ideas have apparently not penetrated much (at all?) into the wider
scientific community (but I suppose I shouldn't be surprised -- most
scientists still do one factor at a time experiments 80 years after Fisher).
Specific incarnations can be found in anything Bayesian, mixed effects
models for repeated measures, ridge regression, and the R packages lars and
lasso, among others.

I would speculate that aside from the usual statistics/science cultural
issues, part of the reason for this is that the estimators don't generally
come with neat, classical inference procedures: like it or not, many
scientists have been conditioned by their Stat 101 courses to expect P
values, so in some sense, we are hoisted by our own petard.

Just my $.02 -- contrary(and more knowledgeable) opinions welcome.

-- Bert Gunter
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Frank 
> E Harrell Jr
> Sent: Wednesday, March 02, 2005 5:13 AM
> To: Wittner, Ben
> Cc: [EMAIL PROTECTED]
> Subject: Re: [R] subset selection for logistic regression
> 
> Wittner, Ben wrote:
> > R-packages leaps and subselect implement various methods of 
> selecting best or
> > good subsets of predictor variables for linear regression 
> models, but they do
> > not seem to be applicable to logistic regression models.
> >  
> > Does anyone know of software for finding good subsets of 
> predictor variables for
> > linear regression models?
> >  
> > Thanks.
> >  
> > -Ben
> 
> Why are these procedures still being used?  The performance 
> is known to 
> be bad in almost every sense (see r-help archives).
> 
> Frank Harrell
> 
> >  
> > p.s., The leaps package references "Subset Selection in 
> Regression" by Alan
> > Miller. On page 2 of the
> > 2nd edition of that text it states the following:
> >  
> >   "All of the models which will be considered in this 
> monograph will be linear;
> > that is they
> >will be linear in the regression coefficients.Though 
> most of the ideas and
> > problems carry
> >over to the fitting of nonlinear models and generalized 
> linear models
> > (particularly the fitting
> >of logistic relationships), the complexity is greatly increased."
> 
> 
> -- 
> Frank E Harrell Jr   Professor and Chair   School of Medicine
>   Department of Biostatistics   
> Vanderbilt University
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset selection for logistic regression

2005-03-02 Thread Frank E Harrell Jr
dr mike wrote:
 


-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Wittner, Ben
Sent: 02 March 2005 11:33
To: [EMAIL PROTECTED]
Subject: [R] subset selection for logistic regression

R-packages leaps and subselect implement various methods of 
selecting best or good subsets of predictor variables for 
linear regression models, but they do not seem to be 
applicable to logistic regression models.

Does anyone know of software for finding good subsets of 
predictor variables for linear regression models?

Thanks.
-Ben
p.s., The leaps package references "Subset Selection in 
Regression" by Alan Miller. On page 2 of the 2nd edition of 
that text it states the following:

 "All of the models which will be considered in this 
monograph will be linear; that is they
  will be linear in the regression coefficients.Though most 
of the ideas and problems carry
  over to the fitting of nonlinear models and generalized 
linear models (particularly the fitting
  of logistic relationships), the complexity is greatly increased."

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


The LASSO method and the Least Angle Regression method are two such that
have both been implemented (efficiently IMHO - only one least squares for
all levels of shrinkage IIRC) in the lars package for R of Hastie and Efron.
There is a paper by Madigan and Ridgeway that discusses the use of the Least
Angle Regresson approach in the context of logistic regression - available
for download from Madigan's space at Ruttgers: 
www.stat.rutgers.edu/~madigan/PAPERS/lars3.pdf 

HTH
Mike
Yes things like lasso can help a lot.
--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] subset selection for logistic regression

2005-03-02 Thread dr mike
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Wittner, Ben
> Sent: 02 March 2005 11:33
> To: [EMAIL PROTECTED]
> Subject: [R] subset selection for logistic regression
> 
> R-packages leaps and subselect implement various methods of 
> selecting best or good subsets of predictor variables for 
> linear regression models, but they do not seem to be 
> applicable to logistic regression models.
>  
> Does anyone know of software for finding good subsets of 
> predictor variables for linear regression models?
>  
> Thanks.
>  
> -Ben
>  
> p.s., The leaps package references "Subset Selection in 
> Regression" by Alan Miller. On page 2 of the 2nd edition of 
> that text it states the following:
>  
>   "All of the models which will be considered in this 
> monograph will be linear; that is they
>will be linear in the regression coefficients.Though most 
> of the ideas and problems carry
>over to the fitting of nonlinear models and generalized 
> linear models (particularly the fitting
>of logistic relationships), the complexity is greatly increased."
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 

The LASSO method and the Least Angle Regression method are two such that
have both been implemented (efficiently IMHO - only one least squares for
all levels of shrinkage IIRC) in the lars package for R of Hastie and Efron.
There is a paper by Madigan and Ridgeway that discusses the use of the Least
Angle Regresson approach in the context of logistic regression - available
for download from Madigan's space at Ruttgers: 
www.stat.rutgers.edu/~madigan/PAPERS/lars3.pdf 

HTH

Mike

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset selection for logistic regression

2005-03-02 Thread Frank E Harrell Jr
Wittner, Ben wrote:
R-packages leaps and subselect implement various methods of selecting best or
good subsets of predictor variables for linear regression models, but they do
not seem to be applicable to logistic regression models.
 
Does anyone know of software for finding good subsets of predictor variables for
linear regression models?
 
Thanks.
 
-Ben
Why are these procedures still being used?  The performance is known to 
be bad in almost every sense (see r-help archives).

Frank Harrell
 
p.s., The leaps package references "Subset Selection in Regression" by Alan
Miller. On page 2 of the
2nd edition of that text it states the following:
 
  "All of the models which will be considered in this monograph will be linear;
that is they
   will be linear in the regression coefficients.Though most of the ideas and
problems carry
   over to the fitting of nonlinear models and generalized linear models
(particularly the fitting
   of logistic relationships), the complexity is greatly increased."

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] subset selection for logistic regression

2005-03-02 Thread Wittner, Ben
R-packages leaps and subselect implement various methods of selecting best or
good subsets of predictor variables for linear regression models, but they do
not seem to be applicable to logistic regression models.
 
Does anyone know of software for finding good subsets of predictor variables for
linear regression models?
 
Thanks.
 
-Ben
 
p.s., The leaps package references "Subset Selection in Regression" by Alan
Miller. On page 2 of the
2nd edition of that text it states the following:
 
  "All of the models which will be considered in this monograph will be linear;
that is they
   will be linear in the regression coefficients.Though most of the ideas and
problems carry
   over to the fitting of nonlinear models and generalized linear models
(particularly the fitting
   of logistic relationships), the complexity is greatly increased."

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] subset

2005-02-09 Thread John Fox
Dear Rogerio,

x[rowSums(x[,2:5]) != 0,] should do what you want.

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Rogerio Rosa da Silva
> Sent: Wednesday, February 09, 2005 8:52 AM
> To: r-help@stat.math.ethz.ch
> Subject: [R] subset
> 
> Dear all,
> 
> I am trying to extract rows from a data.frame based on the 
> rowSums != 0.  I want to preserve rownames in the first 
> column in the subset.
> 
> Does anyone know how to extract all species that don't have 
> rowSums equal to zero?  Here it is:
> 
> # dataset
> x <- data.frame(
> species=c("sp.1","sp.2","sp.3","sp.4"),
> site1=c(2,3,0,0),
> site2=c(0,0,0,0),
> site3=c(0,1,0,6),
> site4=c(0,0,0,0))
> 
> #I want extract the matrix:
> 
>  species site1 site2 site3 site4
>   sp.1     2     0     0     0
>   sp.2     3     0     1     0
>   sp.4     0     0     6     0
> 
> #extract data.frame of rowSums with x[,2:4] != 0 
> 
> subset (x, apply (x,1,function(row) all(rowSums(x[,2:4] !=0)) 
> ## don't work
> 
> 
> Thanks in advance.
> 
> --
> Rogério R. Silva
> MZUSP http://www.mz.usp.br
> Linux/Debian User # 354364
> Linux counter http://counter.li.org
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] subset

2005-02-09 Thread Liaw, Andy
Something like:

> x[rowSums(x[,-1]) > 0,]
  species site1 site2 site3 site4
1sp.1 2 0 0 0
2sp.2 3 0 1 0
4sp.4 0 0 6 0

Andy

> From: Rogerio Rosa da Silva
> 
> Dear all,
> 
> I am trying to extract rows from a data.frame based on the
> rowSums != 0.  I want to preserve rownames in the first 
> column in the subset.
> 
> Does anyone know how to extract all species that don't have 
> rowSums equal
> to zero?  Here it is:
> 
> # dataset
> x <- data.frame(
> species=c("sp.1","sp.2","sp.3","sp.4"),
> site1=c(2,3,0,0),
> site2=c(0,0,0,0),
> site3=c(0,1,0,6),
> site4=c(0,0,0,0))
> 
> #I want extract the matrix:
> 
>  species site1 site2 site3 site4
>   sp.1     2     0     0     0
>   sp.2     3     0     1     0
>   sp.4     0     0     6     0
> 
> #extract data.frame of rowSums with x[,2:4] != 0 
> 
> subset (x, apply (x,1,function(row) all(rowSums(x[,2:4] !=0)) 
> ## don't work
> 
> 
> Thanks in advance.
> 
> -- 
> Rogério R. Silva
> MZUSP http://www.mz.usp.br
> Linux/Debian User # 354364
> Linux counter http://counter.li.org
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset

2005-02-09 Thread Dimitris Rizopoulos
you have answer it yourself:
x[rowSums(x[,2:4])!=0,]
Best,
Dimitris

Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
- Original Message - 
From: "Rogerio Rosa da Silva" <[EMAIL PROTECTED]>
To: 
Sent: Wednesday, February 09, 2005 2:52 PM
Subject: [R] subset


Dear all,
I am trying to extract rows from a data.frame based on the
rowSums != 0. I want to preserve rownames in the first column in the 
subset.

Does anyone know how to extract all species that don't have rowSums 
equal
to zero?  Here it is:

# dataset
x <- data.frame(
species=c("sp.1","sp.2","sp.3","sp.4"),
site1=c(2,3,0,0),
site2=c(0,0,0,0),
site3=c(0,1,0,6),
site4=c(0,0,0,0))
#I want extract the matrix:
species site1 site2 site3 site4
sp.1 2 0 0 0
sp.2 3 0 1 0
sp.4 0 0 6 0
#extract data.frame of rowSums with x[,2:4] != 0
subset (x, apply (x,1,function(row) all(rowSums(x[,2:4] !=0)) ## 
don't work

Thanks in advance.
--
Rogério R. Silva
MZUSP http://www.mz.usp.br
Linux/Debian User # 354364
Linux counter http://counter.li.org
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset

2005-02-09 Thread Chuck Cleland
subset(x, rowSums(x[,-1], na.rm=TRUE) != 0)
Rogerio Rosa da Silva wrote:
Dear all,
I am trying to extract rows from a data.frame based on the
rowSums != 0.  I want to preserve rownames in the first column in the subset.
Does anyone know how to extract all species that don't have rowSums equal
to zero?  Here it is:
# dataset
x <- data.frame(
species=c("sp.1","sp.2","sp.3","sp.4"),
site1=c(2,3,0,0),
site2=c(0,0,0,0),
site3=c(0,1,0,6),
site4=c(0,0,0,0))
#I want extract the matrix:
 species site1 site2 site3 site4
  sp.1 2 0 0 0
  sp.2 3 0 1 0
  sp.4 0 0 6 0
#extract data.frame of rowSums with x[,2:4] != 0 

subset (x, apply (x,1,function(row) all(rowSums(x[,2:4] !=0)) ## don't work
Thanks in advance.
--
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 452-1424 (M, W, F)
fax: (917) 438-0894
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] subset

2005-02-09 Thread Rogerio Rosa da Silva
Dear all,

I am trying to extract rows from a data.frame based on the
rowSums != 0.  I want to preserve rownames in the first column in the subset.

Does anyone know how to extract all species that don't have rowSums equal
to zero?  Here it is:

# dataset
x <- data.frame(
species=c("sp.1","sp.2","sp.3","sp.4"),
site1=c(2,3,0,0),
site2=c(0,0,0,0),
site3=c(0,1,0,6),
site4=c(0,0,0,0))

#I want extract the matrix:

 species site1 site2 site3 site4
  sp.1     2     0     0     0
  sp.2     3     0     1     0
  sp.4     0     0     6     0

#extract data.frame of rowSums with x[,2:4] != 0 

subset (x, apply (x,1,function(row) all(rowSums(x[,2:4] !=0)) ## don't work


Thanks in advance.

-- 
Rogério R. Silva
MZUSP http://www.mz.usp.br
Linux/Debian User # 354364
Linux counter http://counter.li.org

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset data.frame with value != in all columns

2005-02-07 Thread Tim Howard
Petr,
   Thank you!  Yes, rowSums appears to be even a little bit faster than
unique(which()), and it also maintains the original order. I do want
original order maintained, but I first apply a function to one of my
data.frames (that without any -s ... yes, these do represent nulls,
as someone asked earlier) and rbind these two dataframes back together,
so I need to sort (by rownames) after the rbind (there doesn't seem to
be a sortby option in rbind). 
   I apologize for not jumping on rowSums earlier, I hadn't caught on
that it was summing counts of occurrence of the search value, not
summing the search value itself.
   Thanks again, this is very instructive and *very* helpful.
humbly,
Tim

>>> "Petr Pikal" <[EMAIL PROTECTED]> 02/07/05 02:12AM >>>
Hi Tim

I can not say much about apply, but the code with unique(which()) 
gives you reordered rows in case of - selection

try

set.seed(1)
in.df <- data.frame(
c1=rnorm(4),
c2=rnorm(4),
c3=rnorm(4),
c4=rnorm(4),
c5=rnorm(4))
in.df[in.df>3] <- (-)

system.time(e <- in.df[unique(which(in.df == -, arr.ind = 
TRUE)[,1]), ])
system.time(e1 <- in.df[(rowSums(in.df == -)) != 0,])

all.equal(e,e1)

So if you mind you need to do reordering.

ooo<-order(as.numeric(rownames(e)))
all.equal(e[ooo,],e1)

Cheers
Petr

On 4 Feb 2005 at 11:17, Tim Howard wrote:

> Because I'll be doing this on big datasets and time is important, I
> thought I'd time all the different approaches that were suggested on
a
> small dataframe. The results were very instructive so I thought I'd
> pass them on. I also discovered that my numeric columns (e.g.
> -.000) weren't found by apply() but were found by which() and
the
> simple replace. Was it apply's fault or something else?
> 
> Note how much faster unique(which()) is; wow! Thanks to Marc
Schwartz
> for this blazing solution.
> 
> > nrow(in.df)
> [1] 4
> #extract rows with no -
> > system.time(x <- subset(in.df, apply(in.df, 1,
> function(in.df){all(in.df != -)})))
> [1] 3.25 0.00 3.25   NA   NA
> > system.time(y<- in.df[-unique(which(in.df == -, arr.ind =
> > TRUE)[,
> 1]), ])
> [1] 0.17 0.00 0.17   NA   NA
> > system.time({is.na(in.df) <-in.df == -; z <- na.omit(in.df)})
> [1] 0.25 0.02 0.26   NA   NA
> 
> > nrow(x);nrow(y);nrow(z)
> [1] 39990
> [1] 39626
> [1] 39626
> 
> #extract rows with -
> > system.time(d<-subset(in.df, apply(in.df, 1,
> function(in.df){any(in.df == -)})))
> [1] 3.40 0.00 3.45   NA   NA
> > system.time(e<-in.df[unique(which(in.df == -, arr.ind =
TRUE)[,
> 1]), ])
> [1] 0.11 0.00 0.11   NA   NA
> 
> > nrow(d); nrow(e)
> [1] 10
> [1] 374
> 
> Tim Howard
> 
> 
> >>> Marc Schwartz <[EMAIL PROTECTED]> 02/03/05 03:24PM >>>
> On Thu, 2005-02-03 at 14:57 -0500, Tim Howard wrote: 
>   ... snip...
> > My questions: 
> > Is there a cleaner way to extract all rows containing a specified
> > value? How can I extract all rows that don't have this value in
any
> > col?
> > 
> > #create dummy dataset
> > x <- data.frame(
> > c1=c(-99,-99,-99,4:10),
> > c2=1:10,
> > c3=c(1:3,-99,5:10),
> > c4=c(10:1),
> > c5=c(1:9,-99))
> > 
> ..snip...
> 
> How about this, presuming that your data frame is all numeric:
> 
> For rows containing -99:
> 
> > x[unique(which(x == -99, arr.ind = TRUE)[, 1]), ]
> c1 c2  c3 c4  c5
> 1  -99  1   1 10   1
> 2  -99  2   2  9   2
> 3  -99  3   3  8   3
> 44  4 -99  7   4
> 10  10 10  10  1 -99
> 
> 
> For rows not containing -99:
> 
> > x[-unique(which(x == -99, arr.ind = TRUE)[, 1]), ]
>   c1 c2 c3 c4 c5
> 5  5  5  5  6  5
> 6  6  6  6  5  6
> 7  7  7  7  4  7
> 8  8  8  8  3  8
> 9  9  9  9  2  9
> 
> 
> What I have done here is to use which(), setting arr.ind = TRUE.
This
> returns the row, column indices for the matches to the boolean
> statement. The first column returned by which() in this case are the
> row numbers matching the statement, so I take the first column only.
> 
> Since it is possible that more than one element in a row can match
the
> boolean, I then use unique() to get the singular row values.
> 
> Thus, I can use the returned row indices above to subset the data
> frame.
> 
> HTH,
> 
> Marc Schwartz
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html 

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset data.frame with value != in all columns

2005-02-06 Thread Petr Pikal
Hi Tim

I can not say much about apply, but the code with unique(which()) 
gives you reordered rows in case of - selection

try

set.seed(1)
in.df <- data.frame(
c1=rnorm(4),
c2=rnorm(4),
c3=rnorm(4),
c4=rnorm(4),
c5=rnorm(4))
in.df[in.df>3] <- (-)

system.time(e <- in.df[unique(which(in.df == -, arr.ind = 
TRUE)[,1]), ])
system.time(e1 <- in.df[(rowSums(in.df == -)) != 0,])

all.equal(e,e1)

So if you mind you need to do reordering.

ooo<-order(as.numeric(rownames(e)))
all.equal(e[ooo,],e1)

Cheers
Petr

On 4 Feb 2005 at 11:17, Tim Howard wrote:

> Because I'll be doing this on big datasets and time is important, I
> thought I'd time all the different approaches that were suggested on a
> small dataframe. The results were very instructive so I thought I'd
> pass them on. I also discovered that my numeric columns (e.g.
> -.000) weren't found by apply() but were found by which() and the
> simple replace. Was it apply's fault or something else?
> 
> Note how much faster unique(which()) is; wow! Thanks to Marc Schwartz
> for this blazing solution.
> 
> > nrow(in.df)
> [1] 4
> #extract rows with no -
> > system.time(x <- subset(in.df, apply(in.df, 1,
> function(in.df){all(in.df != -)})))
> [1] 3.25 0.00 3.25   NA   NA
> > system.time(y<- in.df[-unique(which(in.df == -, arr.ind =
> > TRUE)[,
> 1]), ])
> [1] 0.17 0.00 0.17   NA   NA
> > system.time({is.na(in.df) <-in.df == -; z <- na.omit(in.df)})
> [1] 0.25 0.02 0.26   NA   NA
> 
> > nrow(x);nrow(y);nrow(z)
> [1] 39990
> [1] 39626
> [1] 39626
> 
> #extract rows with -
> > system.time(d<-subset(in.df, apply(in.df, 1,
> function(in.df){any(in.df == -)})))
> [1] 3.40 0.00 3.45   NA   NA
> > system.time(e<-in.df[unique(which(in.df == -, arr.ind = TRUE)[,
> 1]), ])
> [1] 0.11 0.00 0.11   NA   NA
> 
> > nrow(d); nrow(e)
> [1] 10
> [1] 374
> 
> Tim Howard
> 
> 
> >>> Marc Schwartz <[EMAIL PROTECTED]> 02/03/05 03:24PM >>>
> On Thu, 2005-02-03 at 14:57 -0500, Tim Howard wrote: 
>   ... snip...
> > My questions: 
> > Is there a cleaner way to extract all rows containing a specified
> > value? How can I extract all rows that don't have this value in any
> > col?
> > 
> > #create dummy dataset
> > x <- data.frame(
> > c1=c(-99,-99,-99,4:10),
> > c2=1:10,
> > c3=c(1:3,-99,5:10),
> > c4=c(10:1),
> > c5=c(1:9,-99))
> > 
> ..snip...
> 
> How about this, presuming that your data frame is all numeric:
> 
> For rows containing -99:
> 
> > x[unique(which(x == -99, arr.ind = TRUE)[, 1]), ]
> c1 c2  c3 c4  c5
> 1  -99  1   1 10   1
> 2  -99  2   2  9   2
> 3  -99  3   3  8   3
> 44  4 -99  7   4
> 10  10 10  10  1 -99
> 
> 
> For rows not containing -99:
> 
> > x[-unique(which(x == -99, arr.ind = TRUE)[, 1]), ]
>   c1 c2 c3 c4 c5
> 5  5  5  5  6  5
> 6  6  6  6  5  6
> 7  7  7  7  4  7
> 8  8  8  8  3  8
> 9  9  9  9  2  9
> 
> 
> What I have done here is to use which(), setting arr.ind = TRUE. This
> returns the row, column indices for the matches to the boolean
> statement. The first column returned by which() in this case are the
> row numbers matching the statement, so I take the first column only.
> 
> Since it is possible that more than one element in a row can match the
> boolean, I then use unique() to get the singular row values.
> 
> Thus, I can use the returned row indices above to subset the data
> frame.
> 
> HTH,
> 
> Marc Schwartz
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subset data.frame with value != in all columns

2005-02-04 Thread Tim Howard
Because I'll be doing this on big datasets and time is important, I
thought I'd time all the different approaches that were suggested on a
small dataframe. The results were very instructive so I thought I'd pass
them on. I also discovered that my numeric columns (e.g. -.000)
weren't found by apply() but were found by which() and the simple
replace. Was it apply's fault or something else?

Note how much faster unique(which()) is; wow! Thanks to Marc Schwartz
for this blazing solution.

> nrow(in.df)
[1] 4
#extract rows with no -
> system.time(x <- subset(in.df, apply(in.df, 1,
function(in.df){all(in.df != -)})))
[1] 3.25 0.00 3.25   NA   NA
> system.time(y<- in.df[-unique(which(in.df == -, arr.ind = TRUE)[,
1]), ])
[1] 0.17 0.00 0.17   NA   NA
> system.time({is.na(in.df) <-in.df == -; z <- na.omit(in.df)})
[1] 0.25 0.02 0.26   NA   NA

> nrow(x);nrow(y);nrow(z)
[1] 39990
[1] 39626
[1] 39626

#extract rows with -
> system.time(d<-subset(in.df, apply(in.df, 1,
function(in.df){any(in.df == -)})))
[1] 3.40 0.00 3.45   NA   NA
> system.time(e<-in.df[unique(which(in.df == -, arr.ind = TRUE)[,
1]), ])
[1] 0.11 0.00 0.11   NA   NA

> nrow(d); nrow(e)
[1] 10
[1] 374

Tim Howard


>>> Marc Schwartz <[EMAIL PROTECTED]> 02/03/05 03:24PM >>>
On Thu, 2005-02-03 at 14:57 -0500, Tim Howard wrote: 
  ... snip...
> My questions: 
> Is there a cleaner way to extract all rows containing a specified
> value?
> How can I extract all rows that don't have this value in any col?
> 
> #create dummy dataset
> x <- data.frame(
> c1=c(-99,-99,-99,4:10),
> c2=1:10,
> c3=c(1:3,-99,5:10),
> c4=c(10:1),
> c5=c(1:9,-99))
> 
..snip...

How about this, presuming that your data frame is all numeric:

For rows containing -99:

> x[unique(which(x == -99, arr.ind = TRUE)[, 1]), ]
c1 c2  c3 c4  c5
1  -99  1   1 10   1
2  -99  2   2  9   2
3  -99  3   3  8   3
44  4 -99  7   4
10  10 10  10  1 -99


For rows not containing -99:

> x[-unique(which(x == -99, arr.ind = TRUE)[, 1]), ]
  c1 c2 c3 c4 c5
5  5  5  5  6  5
6  6  6  6  5  6
7  7  7  7  4  7
8  8  8  8  3  8
9  9  9  9  2  9


What I have done here is to use which(), setting arr.ind = TRUE. This
returns the row, column indices for the matches to the boolean
statement. The first column returned by which() in this case are the
row
numbers matching the statement, so I take the first column only.

Since it is possible that more than one element in a row can match the
boolean, I then use unique() to get the singular row values.

Thus, I can use the returned row indices above to subset the data
frame.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


  1   2   >