Re: [R] removing NA from a data frame

2012-06-22 Thread Stuart Leask
Removing rows with NAs, using na.omit(), doesn't seem to be working for me.

Dataset:

 str ( ex10s )

'data.frame':   2189576 obs. of  5 variables:
$ LOPNR  : int  58 58 58 58 64 64 64 64 64 64 ...
$ DIAGNOS: Factor w/ 173 levels F20,F200,F2000,..: 128 128 128 128 105 
105 105 160 105 105 ...
$ X_DATE : int  20060821 20061207 20080102 20090904 20010327 20010925 20020307 
20021007 20021007 20030320 ...
$ SOURCE : int  2 2 2 2 2 2 2 2 2 1 ...
$ dg : Factor w/ 7 levels 0,1,2,3,..: 6 6 6 6 5 5 5 6 5 5 ...

The only NAs are in the factor dg (put in by 'recode' from the car library; I'm 
trying to eliminate cases with particular factor levels)

 table ( ex10s$dg )

  0   1   2   3   4   5  NA
   2851  271501   63112   98425  335593 1257299  160795

So, I remove the rows with NAs, to a new dataframe ex10ss:

 ex10ss-na.omit(ex10s)

Check all the NAs have been removed:

 table(ex10ss$dg)

  0   1   2   3   4   5  NA
   2851  271501   63112   98425  335593 1257299  160795

 dim(ex10s)
[1] 2189576   5
 dim(ex10ss)
[1] 2189576   5

Nothing seems to have changed. I want all the rows with NA in removed.

I am clearly doing something wrong.

The only alternative I could find is pretty similar:
use - complete.cases ( ex10 )
ex10ss-ex10s[use,]
which leads to the same result.


Stuart


Dr Stuart John Leask DM FRCPsych MB Mchir
Clinical Senior Lecturer and Honorary Consultant Pychiatrist
Institute of Mental Health, Innovation Park
Triumph Road, Nottingham, Notts. NG7 2TU. UK
Tel. +44 115 82 30419 
stuart.le...@nottingham.ac.ukmailto:stuart.le...@nottingham.ac.uk
Google 'Dr Stuart Leask'


This message and any attachment are intended solely for the addressee and may 
contain confidential information. If you have received this message in error, 
please send it back to me, and immediately delete it.   Please do not use, copy 
or disclose the information contained in this message or in any attachment.  
Any views or opinions expressed by the author of this email do not necessarily 
reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment
may still contain software viruses which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] removing NA from a data frame

2012-06-22 Thread Petr PIKAL
Hi

both na.omit and complete cases works for me smoothly when NA is not a 
valid level in factor.

If this is the case, as it seems to be, you need reset your factor levels 
so that NA is not a valid level.

ex10s$dg - factor( ex10s$dg )

both commands shall work than.

Regards
Petr


 
 Removing rows with NAs, using na.omit(), doesn't seem to be working for 
me.
 
 Dataset:
 
  str ( ex10s )
 
 'data.frame':   2189576 obs. of  5 variables:
 $ LOPNR  : int  58 58 58 58 64 64 64 64 64 64 ...
 $ DIAGNOS: Factor w/ 173 levels F20,F200,F2000,..: 128 128 128 128 

 105 105 105 160 105 105 ...
 $ X_DATE : int  20060821 20061207 20080102 20090904 20010327 20010925 
 20020307 20021007 20021007 20030320 ...
 $ SOURCE : int  2 2 2 2 2 2 2 2 2 1 ...
 $ dg : Factor w/ 7 levels 0,1,2,3,..: 6 6 6 6 5 5 5 6 5 5 
...
 
 The only NAs are in the factor dg (put in by 'recode' from the car 
 library; I'm trying to eliminate cases with particular factor levels)
 
  table ( ex10s$dg )
 
   0   1   2   3   4   5  NA
2851  271501   63112   98425  335593 1257299  160795
 
 So, I remove the rows with NAs, to a new dataframe ex10ss:
 
  ex10ss-na.omit(ex10s)
 
 Check all the NAs have been removed:
 
  table(ex10ss$dg)
 
   0   1   2   3   4   5  NA
2851  271501   63112   98425  335593 1257299  160795
 
  dim(ex10s)
 [1] 2189576   5
  dim(ex10ss)
 [1] 2189576   5
 
 Nothing seems to have changed. I want all the rows with NA in removed.
 
 I am clearly doing something wrong.
 
 The only alternative I could find is pretty similar:
 use - complete.cases ( ex10 )
 ex10ss-ex10s[use,]
 which leads to the same result.
 
 
 Stuart
 
 
 Dr Stuart John Leask DM FRCPsych MB Mchir
 Clinical Senior Lecturer and Honorary Consultant Pychiatrist
 Institute of Mental Health, Innovation Park
 Triumph Road, Nottingham, Notts. NG7 2TU. UK
 Tel. +44 115 82 30419 stuart.le...@nottingham.ac.uk
 mailto:stuart.le...@nottingham.ac.uk
 Google 'Dr Stuart Leask'
 
 
 This message and any attachment are intended solely for the addressee 
and 
 may contain confidential information. If you have received this message 
in
 error, please send it back to me, and immediately delete it.   Please do 

 not use, copy or disclose the information contained in this message or 
in 
 any attachment.  Any views or opinions expressed by the author of this 
 email do not necessarily reflect the views of the University of 
Nottingham.
 
 This message has been checked for viruses but the contents of an 
attachment
 may still contain software viruses which could damage your computer 
system:
 you are advised to perform your own checks. Email communications with 
the
 University of Nottingham may be monitored as permitted by UK 
legislation.
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] removing NA from a data frame

2012-06-22 Thread Prof Brian Ripley

On 22/06/2012 09:41, Stuart Leask wrote:

Removing rows with NAs, using na.omit(), doesn't seem to be working for me.


It won't if NA is a level of the factor, which is what you seems to have 
here.  For


 table(as.factor(c(1,2,NA)))

1 2
1 1

omits NAs by default.


Dataset:


str ( ex10s )


'data.frame':   2189576 obs. of  5 variables:
$ LOPNR  : int  58 58 58 58 64 64 64 64 64 64 ...
$ DIAGNOS: Factor w/ 173 levels F20,F200,F2000,..: 128 128 128 128 105 
105 105 160 105 105 ...
$ X_DATE : int  20060821 20061207 20080102 20090904 20010327 20010925 20020307 
20021007 20021007 20030320 ...
$ SOURCE : int  2 2 2 2 2 2 2 2 2 1 ...
$ dg : Factor w/ 7 levels 0,1,2,3,..: 6 6 6 6 5 5 5 6 5 5 ...

The only NAs are in the factor dg (put in by 'recode' from the car library; I'm 
trying to eliminate cases with particular factor levels)


table ( ex10s$dg )


   0   1   2   3   4   5  NA
2851  271501   63112   98425  335593 1257299  160795

So, I remove the rows with NAs, to a new dataframe ex10ss:


ex10ss-na.omit(ex10s)


Check all the NAs have been removed:


table(ex10ss$dg)


   0   1   2   3   4   5  NA
2851  271501   63112   98425  335593 1257299  160795


dim(ex10s)

[1] 2189576   5

dim(ex10ss)

[1] 2189576   5

Nothing seems to have changed. I want all the rows with NA in removed.

I am clearly doing something wrong.

The only alternative I could find is pretty similar:
use - complete.cases ( ex10 )
ex10ss-ex10s[use,]
which leads to the same result.


Stuart


Dr Stuart John Leask DM FRCPsych MB Mchir
Clinical Senior Lecturer and Honorary Consultant Pychiatrist
Institute of Mental Health, Innovation Park
Triumph Road, Nottingham, Notts. NG7 2TU. UK
Tel. +44 115 82 30419 
stuart.le...@nottingham.ac.ukmailto:stuart.le...@nottingham.ac.uk
Google 'Dr Stuart Leask'

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] removing NA from a data frame

2012-06-22 Thread Peter Ehlers

On 2012-06-22 01:41, Stuart Leask wrote:

Removing rows with NAs, using na.omit(), doesn't seem to be working for me.

Dataset:


str ( ex10s )


'data.frame':   2189576 obs. of  5 variables:
$ LOPNR  : int  58 58 58 58 64 64 64 64 64 64 ...
$ DIAGNOS: Factor w/ 173 levels F20,F200,F2000,..: 128 128 128 128 105 
105 105 160 105 105 ...
$ X_DATE : int  20060821 20061207 20080102 20090904 20010327 20010925 20020307 
20021007 20021007 20030320 ...
$ SOURCE : int  2 2 2 2 2 2 2 2 2 1 ...
$ dg : Factor w/ 7 levels 0,1,2,3,..: 6 6 6 6 5 5 5 6 5 5 ...

The only NAs are in the factor dg (put in by 'recode' from the car library; I'm 
trying to eliminate cases with particular factor levels)


table ( ex10s$dg )


   0   1   2   3   4   5  NA
2851  271501   63112   98425  335593 1257299  160795


This shows that what you think are missing values (NAs)
R considers to be values at the factor level NA.
If you do

  levels(ex10s$dg)

you should see NA as one of the levels. This probably
resulted from incorrect data import. When you print ex10s$dg
you should see missing values printed as NA, not NA.

Either re-import the data or run

 is.na(ex10s$dg) - ex10s$dg == NA
 ex10s$dg - factor(ex10s$dg)   ## to remove the superfluous level


Peter Ehlers



So, I remove the rows with NAs, to a new dataframe ex10ss:


ex10ss-na.omit(ex10s)


Check all the NAs have been removed:


table(ex10ss$dg)


   0   1   2   3   4   5  NA
2851  271501   63112   98425  335593 1257299  160795


dim(ex10s)

[1] 2189576   5

dim(ex10ss)

[1] 2189576   5

Nothing seems to have changed. I want all the rows with NA in removed.

I am clearly doing something wrong.

The only alternative I could find is pretty similar:
use- complete.cases ( ex10 )
ex10ss-ex10s[use,]
which leads to the same result.


Stuart


Dr Stuart John Leask DM FRCPsych MB Mchir
Clinical Senior Lecturer and Honorary Consultant Pychiatrist
Institute of Mental Health, Innovation Park
Triumph Road, Nottingham, Notts. NG7 2TU. UK
Tel. +44 115 82 30419 
stuart.le...@nottingham.ac.ukmailto:stuart.le...@nottingham.ac.uk
Google 'Dr Stuart Leask'


This message and any attachment are intended solely for the addressee and may 
contain confidential information. If you have received this message in error, 
please send it back to me, and immediately delete it.   Please do not use, copy 
or disclose the information contained in this message or in any attachment.  
Any views or opinions expressed by the author of this email do not necessarily 
reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment
may still contain software viruses which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.