Hi Jim, Thanks for your reply. I know these basic stuffs in R. But I want to know let say you have a data frame X with 300 features. >From that 300 features I need to pullout the names of each feature that has zero values for all the observations in that sample.
Here I am looking for a package or a function to do that. And how do I know whether there are abnormal values for each feature. Let say I have 300 features and 100000 observations. It is hard to look everything in the excel file. Instead of that I am looking for a package that does the work. I hope you understood. Thanks a lot Cheers On Thu, Mar 31, 2016 at 1:13 PM, Jim Lemon <drjimle...@gmail.com> wrote: > Hi Norman, > To check whether all values of an object (say "x") fulfill a certain > condition (==0): > > all(x==0) > > If your object (X) is indeed a data frame, you can only do this by > column, so if you want to get the results: > > X<-data.frame(A=c(0,1:10),B=c(0,2:10,99999), > C=c(0,-1,3:11),D=rep(0,11)) > all_zeros<-function(x) return(all(x==0)) > which_cols<-unlist(lapply(X,all_zeros)) > > If your data frame (or a subset) contains all numeric values, you can > finesse the problem like this: > > which_rows<-apply(as.matrix(X),1,all_zeros) > > What you get is a list of logical (TRUE/FALSE) values from lapply, so > it has to be unlisted to get a vector of logical values like you get > with "apply". > > You can then use that vector to index (subset) the original data frame > by logically inverting it with ! (NOT): > > X[,!which_cols] > X[!which_rows,] > > Your "outliers" look suspiciously like missing values from certain > statistical packages. If you know the values you are looking for, you > can do something like: > > NA99999<-X==99999 > > and then "remove" them by replacing those values with NA: > > X[NA99999]<-NA > > Be aware that all these hackles (diminutive of hacks) are pretty > specific to this example. Also remember that if this is homework, your > karma has just gone down the cosmic sinkhole. > > Jim > > > On Thu, Mar 31, 2016 at 9:56 AM, Norman Pat <normanma...@gmail.com> wrote: > > Hi team > > > > I am new to R so please help me to do this task. > > > > Please find the attached data sample. But in the original data frame I > > have 350 features and 400000 observations. > > > > I need to carryout these tasks. > > > > 1. How to Identify features (names) that have all zeros? > > > > 2. How to remove features that have all zeros from the dataset? > > > > 3. How to identify features (names) that have outliers such as 99999,-1 > in > > the data frame. > > > > 4. How to remove outliers? > > > > > > Many thanks > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.