On 25/01/12 05:14, Francisco wrote:
Hello,
I have a dataset with 40 variables, some of them are always 0 (each row). I would like to make a subset containing only the columns which values are not all 0, but I don't know how to do it.

I tried:

for(cut_column in 1:40) {

if(sum(dataset[,cut_column])!=0) {
                columns_useful<-c(columns_useful,dataset[cut_column])

}
}

sorted_dataset<-subset(dataset, select=columns_useful)

But it doesn't work.

Try:

    good_dataset <- dataset[,sapply(dataset,function(x){!all(x==0)})]

This works modulo possible gotchas induced by floating point arithmetic.

Another possibility:

    tol <- sqrt(.Machine$double.eps)
good_dataset <- dataset[,sapply(dataset,function(x){!all(abs(x)<=tol)})]

Or:

good_dataset <- dataset[,sapply(dataset,function(x){!isTRUE(all.equal(x,rep(0,length(x))))})]

The foregoing could trip up if some columns of "dataset" have extra attributes tagging along. E.g. the column could actually be a numeric matrix of zeroes --- in which case
it wouldn't get dropped.

    cheers,

        Rolf Turner

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to