[R] Logical statements and subseting data...

2008-02-25 Thread Neil Shephard
Hi,

I'm scratching my head as to why I can't use the subset() command to
remove one line of data from a data frame.

There is just one row (out of 45840) that I'd like to remove and it
can be identified using

 dim(raw.all.clean)
[1] 4584010
 subset(raw.all.clean, Height.1 == 0  Height.2 == 0)
  Sample.Name Well   SNP Allele.1 Allele.2 Size.1 Size.2 Height.1
47068  CA0153  O02 rs2106776   NA NA0
  Height.2 Pool
4706803

(Note that the row index of 47068 which is higher than the rows
reported by dim() is simply because I have already removed a number of
rows).

So I want to remove this one instance where Height.1 == 0  Height.2
== 0.  I'd have thought that a logical expression where Height.1 != 0
 Height.2 != 0 would have achieved this, but it doesn't seem to
correctly drop out this one observation, instead its dropping out far
more observations...

 t - subset(raw.all.clean, Height.1 != 0  Height.2 != 0)
 dim(t)
[1] 3815010

Thus 7690 rows have been removed.  It seems to be that the ''
operator is being interparated as an 'OR' (|) since...

 dim(subset(raw.all.clean, Height.1 != 0))
[1] 4215210
 dim(subset(raw.all.clean, Height.2 != 0))
[1] 4183710

...and...

 dim(raw.all.clean) - dim(subset(raw.all.clean, Height.1 != 0))
[1] 36880
 dim(raw.all.clean) - dim(subset(raw.all.clean, Height.2 != 0))
[1] 40030

 3688 + 4003
[1] 7691

(This is one more than the number of rows being removed, but given
that there is one sample where both Height.1 and Height.2 are '0'
thats fine).

I thought I understood how logical expressions are constructed, and
have gone back and read the entries on precedence, but can't work out
why the above is happening?

Whats particularly perplexing (to me) is that the test for exact
equality works, but not for inequality?

I feel like I'm missing something blatantly obvious, but can't work
out what it is.

Cheers,

Neil

-- 
Email - [EMAIL PROTECTED] / [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logical statements and subseting data...

2008-02-25 Thread Neil Shephard
Thanks Thierry, they do both leave me with what I expected.

On Mon, Feb 25, 2008 at 2:28 PM, ONKELINX, Thierry
[EMAIL PROTECTED] wrote:
 The negation of Height.1 == 0  Height.2 == 0 was incorrect. Use

  subset(raw.all.clean, !(Height.1 == 0  Height.2 == 0))

I can see clearly how this expression works (negating the whole test), but...

  or

  subset(raw.all.clean, Height.1 != 0 | Height.2 != 0)

...not how this works, since the above to me is saying Height.1 is NOT
zero OR  Height.2 is NOT zero, which to my mind would pick out samples
where either one or the other is not equal to zero (and of course
those instances where both are equal to zero)?

It seems to me that  (AND) and | (OR) are used the wrong way round in
this case, since the intersection of the two tests for inequality is
what is required?

Neil
-- 
Email - [EMAIL PROTECTED] / [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logical statements and subseting data...

2008-02-25 Thread ONKELINX, Thierry
The negation of Height.1 == 0  Height.2 == 0 was incorrect. Use

subset(raw.all.clean, !(Height.1 == 0  Height.2 == 0))

or

subset(raw.all.clean, Height.1 != 0 | Height.2 != 0)

HTH,

Thierry



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
[EMAIL PROTECTED] 
www.inbo.be 

Do not put your faith in what statistics say until you have carefully
considered what they do not say.  ~William W. Watt
A statistical analysis, properly conducted, is a delicate dissection of
uncertainties, a surgery of suppositions. ~M.J.Moroney

-Oorspronkelijk bericht-
Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Namens Neil Shephard
Verzonden: maandag 25 februari 2008 15:21
Aan: r-help
Onderwerp: [R] Logical statements and subseting data...

Hi,

I'm scratching my head as to why I can't use the subset() command to
remove one line of data from a data frame.

There is just one row (out of 45840) that I'd like to remove and it
can be identified using

 dim(raw.all.clean)
[1] 4584010
 subset(raw.all.clean, Height.1 == 0  Height.2 == 0)
  Sample.Name Well   SNP Allele.1 Allele.2 Size.1 Size.2
Height.1
47068  CA0153  O02 rs2106776   NA NA
0
  Height.2 Pool
4706803

(Note that the row index of 47068 which is higher than the rows
reported by dim() is simply because I have already removed a number of
rows).

So I want to remove this one instance where Height.1 == 0  Height.2
== 0.  I'd have thought that a logical expression where Height.1 != 0
 Height.2 != 0 would have achieved this, but it doesn't seem to
correctly drop out this one observation, instead its dropping out far
more observations...

 t - subset(raw.all.clean, Height.1 != 0  Height.2 != 0)
 dim(t)
[1] 3815010

Thus 7690 rows have been removed.  It seems to be that the ''
operator is being interparated as an 'OR' (|) since...

 dim(subset(raw.all.clean, Height.1 != 0))
[1] 4215210
 dim(subset(raw.all.clean, Height.2 != 0))
[1] 4183710

...and...

 dim(raw.all.clean) - dim(subset(raw.all.clean, Height.1 != 0))
[1] 36880
 dim(raw.all.clean) - dim(subset(raw.all.clean, Height.2 != 0))
[1] 40030

 3688 + 4003
[1] 7691

(This is one more than the number of rows being removed, but given
that there is one sample where both Height.1 and Height.2 are '0'
thats fine).

I thought I understood how logical expressions are constructed, and
have gone back and read the entries on precedence, but can't work out
why the above is happening?

Whats particularly perplexing (to me) is that the test for exact
equality works, but not for inequality?

I feel like I'm missing something blatantly obvious, but can't work
out what it is.

Cheers,

Neil

-- 
Email - [EMAIL PROTECTED] / [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logical statements and subseting data...

2008-02-25 Thread ONKELINX, Thierry
Neil,

Maybe this example will make things more clear to you.

 DF - expand.grid(A = 0:1, B = 0:1)
 cbind(DF, DF$A != 0, DF$B != 0, DF$A != 0  DF$B != 0, DF$A != 0 |
DF$B != 0)
  A B DF$A != 0 DF$B != 0 DF$A != 0  DF$B != 0 DF$A != 0 | DF$B != 0
1 0 0 FALSE FALSE FALSE FALSE
2 1 0  TRUE FALSE FALSE  TRUE
3 0 1 FALSE  TRUE FALSE  TRUE
4 1 1  TRUE  TRUE  TRUE  TRUE

Thierry


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
[EMAIL PROTECTED] 
www.inbo.be 

Do not put your faith in what statistics say until you have carefully
considered what they do not say.  ~William W. Watt
A statistical analysis, properly conducted, is a delicate dissection of
uncertainties, a surgery of suppositions. ~M.J.Moroney

-Oorspronkelijk bericht-
Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Namens Neil Shephard
Verzonden: maandag 25 februari 2008 15:36
Aan: ONKELINX, Thierry
CC: r-help
Onderwerp: Re: [R] Logical statements and subseting data...

Thanks Thierry, they do both leave me with what I expected.

On Mon, Feb 25, 2008 at 2:28 PM, ONKELINX, Thierry
[EMAIL PROTECTED] wrote:
 The negation of Height.1 == 0  Height.2 == 0 was incorrect. Use

  subset(raw.all.clean, !(Height.1 == 0  Height.2 == 0))

I can see clearly how this expression works (negating the whole test),
but...

  or

  subset(raw.all.clean, Height.1 != 0 | Height.2 != 0)

...not how this works, since the above to me is saying Height.1 is NOT
zero OR  Height.2 is NOT zero, which to my mind would pick out samples
where either one or the other is not equal to zero (and of course
those instances where both are equal to zero)?

It seems to me that  (AND) and | (OR) are used the wrong way round in
this case, since the intersection of the two tests for inequality is
what is required?

Neil
-- 
Email - [EMAIL PROTECTED] / [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.