Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

Rohit Chaddha Mon, 08 Aug 2016 22:03:50 -0700

@Peyman - does any of the clustering algorithms have "feature Importance"
or "feature selection" ability ?  I can't seem to pinpoint




On Tue, Aug 9, 2016 at 8:49 AM, Peyman Mohajerian <mohaj...@gmail.com>
wrote:

> You can try 'feature Importances' or 'feature selection' depending on what
> else you want to do with the remaining features that's a possibility. Let's
> say you are trying to do classification then some of the Spark Libraries
> have a model parameter called 'featureImportances' that tell you which
> feature(s) are more dominant in you classification, you can then run your
> model again with the smaller set of features.
> The two approaches are quite different, what I'm suggesting involves
> training (supervised learning) in the context of a target function, with
> SVD you are doing unsupervised learning.
>
> On Mon, Aug 8, 2016 at 7:23 PM, Rohit Chaddha <rohitchaddha1...@gmail.com>
> wrote:
>
>> I would rather have less features to make better inferences on the data
>> based on the smaller number of factors,
>> Any suggestions Sean ?
>>
>> On Mon, Aug 8, 2016 at 11:37 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> Yes, that's exactly what PCA is for as Sivakumaran noted. Do you
>>> really want to select features or just obtain a lower-dimensional
>>> representation of them, with less redundancy?
>>>
>>> On Mon, Aug 8, 2016 at 4:10 PM, Tony Lane <tonylane....@gmail.com>
>>> wrote:
>>> > There must be an algorithmic way to figure out which of these factors
>>> > contribute the least and remove them in the analysis.
>>> > I am hoping same one can throw some insight on this.
>>> >
>>> > On Mon, Aug 8, 2016 at 7:41 PM, Sivakumaran S <siva.kuma...@me.com>
>>> wrote:
>>> >>
>>> >> Not an expert here, but the first step would be devote some time and
>>> >> identify which of these 112 factors are actually causative. Some
>>> domain
>>> >> knowledge of the data may be required. Then, you can start of with
>>> PCA.
>>> >>
>>> >> HTH,
>>> >>
>>> >> Regards,
>>> >>
>>> >> Sivakumaran S
>>> >>
>>> >> On 08-Aug-2016, at 3:01 PM, Tony Lane <tonylane....@gmail.com> wrote:
>>> >>
>>> >> Great question Rohit.  I am in my early days of ML as well and it
>>> would be
>>> >> great if we get some idea on this from other experts on this group.
>>> >>
>>> >> I know we can reduce dimensions by using PCA, but i think that does
>>> not
>>> >> allow us to understand which factors from the original are we using
>>> in the
>>> >> end.
>>> >>
>>> >> - Tony L.
>>> >>
>>> >> On Mon, Aug 8, 2016 at 5:12 PM, Rohit Chaddha <
>>> rohitchaddha1...@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>>
>>> >>> I have a data-set where each data-point has 112 factors.
>>> >>>
>>> >>> I want to remove the factors which are not relevant, and say reduce
>>> to 20
>>> >>> factors out of these 112 and then do clustering of data-points using
>>> these
>>> >>> 20 factors.
>>> >>>
>>> >>> How do I do these and how do I figure out which of the 20 factors are
>>> >>> useful for analysis.
>>> >>>
>>> >>> I see SVD and PCA implementations, but I am not sure if these give
>>> which
>>> >>> elements are removed and which are remaining.
>>> >>>
>>> >>> Can someone please help me understand what to do here
>>> >>>
>>> >>> thanks,
>>> >>> -Rohit
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>
>>
>

Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

Reply via email to