I have a dataset that has many columns which are NA or constant, and so I remove them like so:
same <- sapply(dataset, function(.col){ all(is.na(.col)) || all(.col[1L] == .col) }) dataset <- dataset[!same] This works GREAT (thanks to the r-users list archive I found this) however, then when I do my data sampling like so: testSize <- floor(nrow(x) * 10/100) test <- sample(1:nrow(x), testSize) train_data <- x[-test,] test_data <- x[test, -1] test_class <- x[test, 1] It is now possible that test_data or train_data contain columns that are constants, however as one dataset they did not. So the solution for me is to just re-run lines to remove all constants......not a problem, but is this normal? is this how I should be handling this in R? many models I am attempting to use (SVM, lda, etc) don't like if a column has all the same value....... so as a beginner, this is how I am handling it in R, but I am looking for someone to sanity check what I am doing is sound. Brian ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.