Re: [R] eliminating constant variables
What was the question and answer here? -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of pdb Sent: Sunday, July 11, 2010 5:23 AM To: r-help@r-project.org Subject: Re: [R] eliminating constant variables Importance: Low Awsome! It made sense once I realised SD=standard deviation ! pdb -- View this message in context: http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2 284915.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. DISCLAIMER:\ Sample Disclaimer added in a VBScript.\ ...{{dropped:3}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eliminating constant variables
Awsome! It made sense once I realised SD=standard deviation ! pdb -- View this message in context: http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284915.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eliminating constant variables
On Sat, Jul 10, 2010 at 6:28 PM, pdb wrote: > > Hi all, > > I have a large data set and want to immediately build a 'blind' model > without first examining the data. Now it appears in the data there are a lot > of fields that are constant or all missing values - which prevents the model > from being built. > > Can someone point me the right direction as to how I can automatically purge > my data file of these useless fields. > Try this. It will remove constant columns (such as column b below), all NA columns (such as column a below) and columns which are constant aside from NAs (such as column d below). In this example only column c should survive: # test data DF <- data.frame(a = NA, b = 1, c = 1:5, d = c(NA, NA, 1, 1, 1)) sd. <- sd(DF, na.rm = TRUE) DF[!is.na(sd.) & sd. > 0] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eliminating constant variables
Yep - that is what I want. Cheers Jim you Legend. -- View this message in context: http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284861.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eliminating constant variables
Is this what you want: > test <- data.frame(a=runif(10), b=rep(NA, 10), c=rep(3,10), d=runif(10)) > test a b c d 1 0.3390729 NA 3 0.4346595 2 0.8394404 NA 3 0.7125147 3 0.3466835 NA 3 0.344 4 0.3337749 NA 3 0.3253522 5 0.4763512 NA 3 0.7570871 6 0.8921983 NA 3 0.2026923 7 0.8643395 NA 3 0.7111212 8 0.3899895 NA 3 0.1216919 9 0.7773207 NA 3 0.2454885 10 0.9606180 NA 3 0.1433044 > # determine which columns contain all NAs, or the same value > same <- sapply(test, function(.col){ + all(is.na(.col)) || all(.col[1L] == .col) + }) > same a b c d FALSE TRUE TRUE FALSE > # now remove them > test <- test[!same] > test a d 1 0.3390729 0.4346595 2 0.8394404 0.7125147 3 0.3466835 0.344 4 0.3337749 0.3253522 5 0.4763512 0.7570871 6 0.8921983 0.2026923 7 0.8643395 0.7111212 8 0.3899895 0.1216919 9 0.7773207 0.2454885 10 0.9606180 0.1433044 > On Sat, Jul 10, 2010 at 7:45 PM, pdb wrote: > > Hi Jim, > > Thanks for your response, although I was probably not clear about exactly > what I want to achieve, please let me see if I can explain a little > better... > > There are certain (unknown) columns in my data that contain either NULL in > every row, or the same value in every row (eg '1'). These columns are > useless for modelling as there is no variation in the data. > > I need a way to automatically find and delete all these columns (it is not > rows I want to delete, but the whole column, as in > > train$Variablexxx = NULL > > where Variablexxx needs to be automatically found. > > Thanks in advance, > > pdb > -- > View this message in context: > http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284853.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eliminating constant variables
Hi Jim, Thanks for your response, although I was probably not clear about exactly what I want to achieve, please let me see if I can explain a little better... There are certain (unknown) columns in my data that contain either NULL in every row, or the same value in every row (eg '1'). These columns are useless for modelling as there is no variation in the data. I need a way to automatically find and delete all these columns (it is not rows I want to delete, but the whole column, as in train$Variablexxx = NULL where Variablexxx needs to be automatically found. Thanks in advance, pdb -- View this message in context: http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284853.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eliminating constant variables
You can remove NAs with: train <- subset(train, !is.na(TargetVariable)) I am not sure what you mean by constant values. You could use 'table' to determine which values appear the most and then remove them: x <- table(train$TargetVariable) train <- subset(train, !(TargetVariable %in% names(x)[x > someCountAboveWhichToDelete])) But you probably need to look at your data and determine which numbers are in the set that you need to delete. On Sat, Jul 10, 2010 at 6:28 PM, pdb wrote: > > Hi all, > > I have a large data set and want to immediately build a 'blind' model > without first examining the data. Now it appears in the data there are a lot > of fields that are constant or all missing values - which prevents the model > from being built. > > Can someone point me the right direction as to how I can automatically purge > my data file of these useless fields. > > Thanks in advance, > > pdb > > train <- read.csv("TrainingData.csv") > library(gbm) > i.gbm<-gbm(TargetVariable ~ . ,data=train,distribution="bernoulli. > > 1: In gbm.fit(x, y, offset = offset, distribution = distribution, ... : > variable 5: var1 has no variation. > -- > View this message in context: > http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284831.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] eliminating constant variables
Hi all, I have a large data set and want to immediately build a 'blind' model without first examining the data. Now it appears in the data there are a lot of fields that are constant or all missing values - which prevents the model from being built. Can someone point me the right direction as to how I can automatically purge my data file of these useless fields. Thanks in advance, pdb train <- read.csv("TrainingData.csv") library(gbm) i.gbm<-gbm(TargetVariable ~ . ,data=train,distribution="bernoulli. 1: In gbm.fit(x, y, offset = offset, distribution = distribution, ... : variable 5: var1 has no variation. -- View this message in context: http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284831.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.