@Peyman - does any of the clustering algorithms have "feature Importance" or "feature selection" ability ? I can't seem to pinpoint
On Tue, Aug 9, 2016 at 8:49 AM, Peyman Mohajerian <mohaj...@gmail.com> wrote: > You can try 'feature Importances' or 'feature selection' depending on what > else you want to do with the remaining features that's a possibility. Let's > say you are trying to do classification then some of the Spark Libraries > have a model parameter called 'featureImportances' that tell you which > feature(s) are more dominant in you classification, you can then run your > model again with the smaller set of features. > The two approaches are quite different, what I'm suggesting involves > training (supervised learning) in the context of a target function, with > SVD you are doing unsupervised learning. > > On Mon, Aug 8, 2016 at 7:23 PM, Rohit Chaddha <rohitchaddha1...@gmail.com> > wrote: > >> I would rather have less features to make better inferences on the data >> based on the smaller number of factors, >> Any suggestions Sean ? >> >> On Mon, Aug 8, 2016 at 11:37 PM, Sean Owen <so...@cloudera.com> wrote: >> >>> Yes, that's exactly what PCA is for as Sivakumaran noted. Do you >>> really want to select features or just obtain a lower-dimensional >>> representation of them, with less redundancy? >>> >>> On Mon, Aug 8, 2016 at 4:10 PM, Tony Lane <tonylane....@gmail.com> >>> wrote: >>> > There must be an algorithmic way to figure out which of these factors >>> > contribute the least and remove them in the analysis. >>> > I am hoping same one can throw some insight on this. >>> > >>> > On Mon, Aug 8, 2016 at 7:41 PM, Sivakumaran S <siva.kuma...@me.com> >>> wrote: >>> >> >>> >> Not an expert here, but the first step would be devote some time and >>> >> identify which of these 112 factors are actually causative. Some >>> domain >>> >> knowledge of the data may be required. Then, you can start of with >>> PCA. >>> >> >>> >> HTH, >>> >> >>> >> Regards, >>> >> >>> >> Sivakumaran S >>> >> >>> >> On 08-Aug-2016, at 3:01 PM, Tony Lane <tonylane....@gmail.com> wrote: >>> >> >>> >> Great question Rohit. I am in my early days of ML as well and it >>> would be >>> >> great if we get some idea on this from other experts on this group. >>> >> >>> >> I know we can reduce dimensions by using PCA, but i think that does >>> not >>> >> allow us to understand which factors from the original are we using >>> in the >>> >> end. >>> >> >>> >> - Tony L. >>> >> >>> >> On Mon, Aug 8, 2016 at 5:12 PM, Rohit Chaddha < >>> rohitchaddha1...@gmail.com> >>> >> wrote: >>> >>> >>> >>> >>> >>> I have a data-set where each data-point has 112 factors. >>> >>> >>> >>> I want to remove the factors which are not relevant, and say reduce >>> to 20 >>> >>> factors out of these 112 and then do clustering of data-points using >>> these >>> >>> 20 factors. >>> >>> >>> >>> How do I do these and how do I figure out which of the 20 factors are >>> >>> useful for analysis. >>> >>> >>> >>> I see SVD and PCA implementations, but I am not sure if these give >>> which >>> >>> elements are removed and which are remaining. >>> >>> >>> >>> Can someone please help me understand what to do here >>> >>> >>> >>> thanks, >>> >>> -Rohit >>> >>> >>> >> >>> >> >>> > >>> >> >> >