Thank you everyone.
The idea really is for me to get the variables themselves from a
super-set of all variables.
x1 -numeric continuous
x2 -numeric continuous
x3 - numeric Factor with 2 levels
x4 -Character Factor with 10 levels
x5 - numeric continuous
x6 - numeric integer

Variable Reduction method then, must ideally give me

keep : x1, x3 and x6
drop : x2, x4 and x5

The 'redun' function from Hmisc package seems promising since it
considers categorical variables as well. Variable to be dropped is the
variable which can be predicted by other variables. I guess its to
check for multi-colinearity.

The RWeka package, as I mentioned earlier, allows one to use Weka's
variable reduction/selection techniques  in R. I did come across an
implementation of the "Genetic Search' method, but have not been able
to find relevant documentation for the same to tweak to suit my needs.

Thank you all for your time.

Harsh Singhal
Decision Systems,
Mu Sigma Inc.




On Tue, Dec 9, 2008 at 8:05 PM, Ravi Varadhan <[EMAIL PROTECTED]> wrote:
> Principal components analysis does "dimensionality reduction" but NOT
> "variable reduction".  However, Jolliffe's 2004 book on PCA does discuss the
> problem of selecting a subset of variables, with the goal of representing
> the internal variation of original multivariate vector as well as possible
> (see Section 6.3 of that book).  I do not think that these methods can
> handle missing data.  The most important issue is to think about the goal of
> variable reduction and then choose an appropriate optimality criterion for
> achieving that goal.  In most instances of variable selection, the criterion
> that is optimized is never explicitly considered.
>
> Ravi.
>
> ----------------------------------------------------------------------------
> -------
>
> Ravi Varadhan, Ph.D.
>
> Assistant Professor, The Center on Aging and Health
>
> Division of Geriatric Medicine and Gerontology
>
> Johns Hopkins University
>
> Ph: (410) 502-2619
>
> Fax: (410) 614-9625
>
> Email: [EMAIL PROTECTED]
>
> Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
>
>
>
> ----------------------------------------------------------------------------
> --------
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
> Behalf Of Gabor Grothendieck
> Sent: Tuesday, December 09, 2008 8:00 AM
> To: Harsh
> Cc: r-help@r-project.org
> Subject: Re: [R] Pre-model Variable Reduction
>
> See:
>
> ?prcomp
> ?princomp
>
> On Tue, Dec 9, 2008 at 5:34 AM, Harsh <[EMAIL PROTECTED]> wrote:
>> Hello All,
>> I am trying to carry out variable reduction. I do not have information
>> about the dependent variable, and have only the X variables as it
>> were.
>> In selecting variables I wish to keep, I have considered the following
> criteria.
>> 1) Percentage of missing value in each column/variable
>> 2) Variance of each variable, with a cut-off value.
>>
>> I recently came across Weka and found that there is an RWeka package
>> which would allow me to make use of Weka through R.
>> Weka provides a "Genetic search" variable reduction method, but I
>> could not find its R code implementation in the RWeka Pdf file on
>> CRAN.
>>
>> I looked for other R packages that allow me to do variable reduction
>> without considering a dependent variable. I came across 'dprep'
>> package but it does not have a Windows implementation.
>>
>> Moreover, I have a dataset that contains continuous and categorical
>> variables, some categorical variables having 3 levels, 10 levels and
>> so on, till a max 50 levels (E.g. States in the USA).
>>
>> Any suggestions in this regard will be much appreciated.
>>
>> Thank you
>>
>> Harsh Singhal
>> Decision Systems,
>> Mu Sigma, Inc.
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to