Re: [R] eliminating constant variables

2010-07-12 Thread Setlhare Lekgatlhamang
What was the question and answer here?

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of pdb
Sent: Sunday, July 11, 2010 5:23 AM
To: r-help@r-project.org
Subject: Re: [R] eliminating constant variables
Importance: Low


Awsome!

It made sense once I realised SD=standard deviation !

pdb
-- 
View this message in context:
http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2
284915.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



DISCLAIMER:\ Sample Disclaimer added in a VBScript.\ ...{{dropped:3}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eliminating constant variables

2010-07-10 Thread pdb

Awsome!

It made sense once I realised SD=standard deviation !

pdb
-- 
View this message in context: 
http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284915.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eliminating constant variables

2010-07-10 Thread Gabor Grothendieck
On Sat, Jul 10, 2010 at 6:28 PM, pdb  wrote:
>
> Hi all,
>
> I have a large data set and want to immediately build a 'blind' model
> without first examining the data. Now it appears in the data there are a lot
> of fields that are constant or all missing values - which prevents the model
> from being built.
>
> Can someone point me the right direction as to how I can automatically purge
> my data file of these useless fields.
>

Try this. It will remove constant columns (such as column b below),
all NA columns (such as column a below) and columns which are constant
aside from NAs (such as column d below).  In this example only column
c should survive:

# test data
DF <- data.frame(a = NA, b = 1, c = 1:5, d = c(NA, NA, 1, 1, 1))
sd. <- sd(DF, na.rm = TRUE)
DF[!is.na(sd.) & sd. > 0]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eliminating constant variables

2010-07-10 Thread pdb

Yep - that is what I want.

Cheers Jim you Legend.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284861.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eliminating constant variables

2010-07-10 Thread jim holtman
Is this what you want:

> test <- data.frame(a=runif(10), b=rep(NA, 10), c=rep(3,10), d=runif(10))
> test
   a  b c d
1  0.3390729 NA 3 0.4346595
2  0.8394404 NA 3 0.7125147
3  0.3466835 NA 3 0.344
4  0.3337749 NA 3 0.3253522
5  0.4763512 NA 3 0.7570871
6  0.8921983 NA 3 0.2026923
7  0.8643395 NA 3 0.7111212
8  0.3899895 NA 3 0.1216919
9  0.7773207 NA 3 0.2454885
10 0.9606180 NA 3 0.1433044
> # determine which columns contain all NAs, or the same value
> same <- sapply(test, function(.col){
+ all(is.na(.col))  || all(.col[1L] == .col)
+ })
> same
a b c d
FALSE  TRUE  TRUE FALSE
> # now remove them
> test <- test[!same]
> test
   a d
1  0.3390729 0.4346595
2  0.8394404 0.7125147
3  0.3466835 0.344
4  0.3337749 0.3253522
5  0.4763512 0.7570871
6  0.8921983 0.2026923
7  0.8643395 0.7111212
8  0.3899895 0.1216919
9  0.7773207 0.2454885
10 0.9606180 0.1433044
>


On Sat, Jul 10, 2010 at 7:45 PM, pdb  wrote:
>
> Hi Jim,
>
> Thanks for your response, although I was probably not clear about exactly
> what I want to achieve, please let me see if I can explain a little
> better...
>
> There are certain (unknown) columns in my data that contain either NULL in
> every row, or the same value in every row (eg '1'). These columns are
> useless for modelling as there is no variation in the data.
>
> I need a way to automatically find and delete all these columns (it is not
> rows I want to delete, but the whole column, as in
>
> train$Variablexxx = NULL
>
> where Variablexxx needs to be automatically found.
>
> Thanks in advance,
>
> pdb
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284853.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eliminating constant variables

2010-07-10 Thread pdb

Hi Jim, 

Thanks for your response, although I was probably not clear about exactly
what I want to achieve, please let me see if I can explain a little
better...

There are certain (unknown) columns in my data that contain either NULL in
every row, or the same value in every row (eg '1'). These columns are
useless for modelling as there is no variation in the data.

I need a way to automatically find and delete all these columns (it is not
rows I want to delete, but the whole column, as in 

train$Variablexxx = NULL

where Variablexxx needs to be automatically found.

Thanks in advance,

pdb
-- 
View this message in context: 
http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284853.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eliminating constant variables

2010-07-10 Thread jim holtman
You can remove NAs with:

train <- subset(train, !is.na(TargetVariable))

I am not sure what you mean by constant values.  You could use 'table'
to determine which values appear the most and then remove them:

x <- table(train$TargetVariable)
train <- subset(train, !(TargetVariable %in% names(x)[x >
someCountAboveWhichToDelete]))

But you probably need to look at your data and determine which numbers
are in the set that you need to delete.

On Sat, Jul 10, 2010 at 6:28 PM, pdb  wrote:
>
> Hi all,
>
> I have a large data set and want to immediately build a 'blind' model
> without first examining the data. Now it appears in the data there are a lot
> of fields that are constant or all missing values - which prevents the model
> from being built.
>
> Can someone point me the right direction as to how I can automatically purge
> my data file of these useless fields.
>
> Thanks in advance,
>
> pdb
>
> train <- read.csv("TrainingData.csv")
> library(gbm)
> i.gbm<-gbm(TargetVariable ~ . ,data=train,distribution="bernoulli.
>
> 1: In gbm.fit(x, y, offset = offset, distribution = distribution,  ... :
>  variable 5: var1 has no variation.
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284831.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] eliminating constant variables

2010-07-10 Thread pdb

Hi all,

I have a large data set and want to immediately build a 'blind' model
without first examining the data. Now it appears in the data there are a lot
of fields that are constant or all missing values - which prevents the model
from being built.

Can someone point me the right direction as to how I can automatically purge
my data file of these useless fields. 

Thanks in advance,

pdb

train <- read.csv("TrainingData.csv")
library(gbm)
i.gbm<-gbm(TargetVariable ~ . ,data=train,distribution="bernoulli.

1: In gbm.fit(x, y, offset = offset, distribution = distribution,  ... :
  variable 5: var1 has no variation.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284831.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.