Re: [R] removing NA from a data frame
Removing rows with NAs, using na.omit(), doesn't seem to be working for me. Dataset: str ( ex10s ) 'data.frame': 2189576 obs. of 5 variables: $ LOPNR : int 58 58 58 58 64 64 64 64 64 64 ... $ DIAGNOS: Factor w/ 173 levels F20,F200,F2000,..: 128 128 128 128 105 105 105 160 105 105 ... $ X_DATE : int 20060821 20061207 20080102 20090904 20010327 20010925 20020307 20021007 20021007 20030320 ... $ SOURCE : int 2 2 2 2 2 2 2 2 2 1 ... $ dg : Factor w/ 7 levels 0,1,2,3,..: 6 6 6 6 5 5 5 6 5 5 ... The only NAs are in the factor dg (put in by 'recode' from the car library; I'm trying to eliminate cases with particular factor levels) table ( ex10s$dg ) 0 1 2 3 4 5 NA 2851 271501 63112 98425 335593 1257299 160795 So, I remove the rows with NAs, to a new dataframe ex10ss: ex10ss-na.omit(ex10s) Check all the NAs have been removed: table(ex10ss$dg) 0 1 2 3 4 5 NA 2851 271501 63112 98425 335593 1257299 160795 dim(ex10s) [1] 2189576 5 dim(ex10ss) [1] 2189576 5 Nothing seems to have changed. I want all the rows with NA in removed. I am clearly doing something wrong. The only alternative I could find is pretty similar: use - complete.cases ( ex10 ) ex10ss-ex10s[use,] which leads to the same result. Stuart Dr Stuart John Leask DM FRCPsych MB Mchir Clinical Senior Lecturer and Honorary Consultant Pychiatrist Institute of Mental Health, Innovation Park Triumph Road, Nottingham, Notts. NG7 2TU. UK Tel. +44 115 82 30419 stuart.le...@nottingham.ac.ukmailto:stuart.le...@nottingham.ac.uk Google 'Dr Stuart Leask' This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removing NA from a data frame
Hi both na.omit and complete cases works for me smoothly when NA is not a valid level in factor. If this is the case, as it seems to be, you need reset your factor levels so that NA is not a valid level. ex10s$dg - factor( ex10s$dg ) both commands shall work than. Regards Petr Removing rows with NAs, using na.omit(), doesn't seem to be working for me. Dataset: str ( ex10s ) 'data.frame': 2189576 obs. of 5 variables: $ LOPNR : int 58 58 58 58 64 64 64 64 64 64 ... $ DIAGNOS: Factor w/ 173 levels F20,F200,F2000,..: 128 128 128 128 105 105 105 160 105 105 ... $ X_DATE : int 20060821 20061207 20080102 20090904 20010327 20010925 20020307 20021007 20021007 20030320 ... $ SOURCE : int 2 2 2 2 2 2 2 2 2 1 ... $ dg : Factor w/ 7 levels 0,1,2,3,..: 6 6 6 6 5 5 5 6 5 5 ... The only NAs are in the factor dg (put in by 'recode' from the car library; I'm trying to eliminate cases with particular factor levels) table ( ex10s$dg ) 0 1 2 3 4 5 NA 2851 271501 63112 98425 335593 1257299 160795 So, I remove the rows with NAs, to a new dataframe ex10ss: ex10ss-na.omit(ex10s) Check all the NAs have been removed: table(ex10ss$dg) 0 1 2 3 4 5 NA 2851 271501 63112 98425 335593 1257299 160795 dim(ex10s) [1] 2189576 5 dim(ex10ss) [1] 2189576 5 Nothing seems to have changed. I want all the rows with NA in removed. I am clearly doing something wrong. The only alternative I could find is pretty similar: use - complete.cases ( ex10 ) ex10ss-ex10s[use,] which leads to the same result. Stuart Dr Stuart John Leask DM FRCPsych MB Mchir Clinical Senior Lecturer and Honorary Consultant Pychiatrist Institute of Mental Health, Innovation Park Triumph Road, Nottingham, Notts. NG7 2TU. UK Tel. +44 115 82 30419 stuart.le...@nottingham.ac.uk mailto:stuart.le...@nottingham.ac.uk Google 'Dr Stuart Leask' This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removing NA from a data frame
On 22/06/2012 09:41, Stuart Leask wrote: Removing rows with NAs, using na.omit(), doesn't seem to be working for me. It won't if NA is a level of the factor, which is what you seems to have here. For table(as.factor(c(1,2,NA))) 1 2 1 1 omits NAs by default. Dataset: str ( ex10s ) 'data.frame': 2189576 obs. of 5 variables: $ LOPNR : int 58 58 58 58 64 64 64 64 64 64 ... $ DIAGNOS: Factor w/ 173 levels F20,F200,F2000,..: 128 128 128 128 105 105 105 160 105 105 ... $ X_DATE : int 20060821 20061207 20080102 20090904 20010327 20010925 20020307 20021007 20021007 20030320 ... $ SOURCE : int 2 2 2 2 2 2 2 2 2 1 ... $ dg : Factor w/ 7 levels 0,1,2,3,..: 6 6 6 6 5 5 5 6 5 5 ... The only NAs are in the factor dg (put in by 'recode' from the car library; I'm trying to eliminate cases with particular factor levels) table ( ex10s$dg ) 0 1 2 3 4 5 NA 2851 271501 63112 98425 335593 1257299 160795 So, I remove the rows with NAs, to a new dataframe ex10ss: ex10ss-na.omit(ex10s) Check all the NAs have been removed: table(ex10ss$dg) 0 1 2 3 4 5 NA 2851 271501 63112 98425 335593 1257299 160795 dim(ex10s) [1] 2189576 5 dim(ex10ss) [1] 2189576 5 Nothing seems to have changed. I want all the rows with NA in removed. I am clearly doing something wrong. The only alternative I could find is pretty similar: use - complete.cases ( ex10 ) ex10ss-ex10s[use,] which leads to the same result. Stuart Dr Stuart John Leask DM FRCPsych MB Mchir Clinical Senior Lecturer and Honorary Consultant Pychiatrist Institute of Mental Health, Innovation Park Triumph Road, Nottingham, Notts. NG7 2TU. UK Tel. +44 115 82 30419 stuart.le...@nottingham.ac.ukmailto:stuart.le...@nottingham.ac.uk Google 'Dr Stuart Leask' __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removing NA from a data frame
On 2012-06-22 01:41, Stuart Leask wrote: Removing rows with NAs, using na.omit(), doesn't seem to be working for me. Dataset: str ( ex10s ) 'data.frame': 2189576 obs. of 5 variables: $ LOPNR : int 58 58 58 58 64 64 64 64 64 64 ... $ DIAGNOS: Factor w/ 173 levels F20,F200,F2000,..: 128 128 128 128 105 105 105 160 105 105 ... $ X_DATE : int 20060821 20061207 20080102 20090904 20010327 20010925 20020307 20021007 20021007 20030320 ... $ SOURCE : int 2 2 2 2 2 2 2 2 2 1 ... $ dg : Factor w/ 7 levels 0,1,2,3,..: 6 6 6 6 5 5 5 6 5 5 ... The only NAs are in the factor dg (put in by 'recode' from the car library; I'm trying to eliminate cases with particular factor levels) table ( ex10s$dg ) 0 1 2 3 4 5 NA 2851 271501 63112 98425 335593 1257299 160795 This shows that what you think are missing values (NAs) R considers to be values at the factor level NA. If you do levels(ex10s$dg) you should see NA as one of the levels. This probably resulted from incorrect data import. When you print ex10s$dg you should see missing values printed as NA, not NA. Either re-import the data or run is.na(ex10s$dg) - ex10s$dg == NA ex10s$dg - factor(ex10s$dg) ## to remove the superfluous level Peter Ehlers So, I remove the rows with NAs, to a new dataframe ex10ss: ex10ss-na.omit(ex10s) Check all the NAs have been removed: table(ex10ss$dg) 0 1 2 3 4 5 NA 2851 271501 63112 98425 335593 1257299 160795 dim(ex10s) [1] 2189576 5 dim(ex10ss) [1] 2189576 5 Nothing seems to have changed. I want all the rows with NA in removed. I am clearly doing something wrong. The only alternative I could find is pretty similar: use- complete.cases ( ex10 ) ex10ss-ex10s[use,] which leads to the same result. Stuart Dr Stuart John Leask DM FRCPsych MB Mchir Clinical Senior Lecturer and Honorary Consultant Pychiatrist Institute of Mental Health, Innovation Park Triumph Road, Nottingham, Notts. NG7 2TU. UK Tel. +44 115 82 30419 stuart.le...@nottingham.ac.ukmailto:stuart.le...@nottingham.ac.uk Google 'Dr Stuart Leask' This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.