Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

Tony Lane Mon, 08 Aug 2016 08:11:01 -0700

There must be an algorithmic way to figure out which of these factors
contribute the least and remove them in the analysis.
I am hoping same one can throw some insight on this.


On Mon, Aug 8, 2016 at 7:41 PM, Sivakumaran S <siva.kuma...@me.com> wrote:

> Not an expert here, but the first step would be devote some time and
> identify which of these 112 factors are actually causative. Some domain
> knowledge of the data may be required. Then, you can start of with PCA.
>
> HTH,
>
> Regards,
>
> Sivakumaran S
>
> On 08-Aug-2016, at 3:01 PM, Tony Lane <tonylane....@gmail.com> wrote:
>
> Great question Rohit.  I am in my early days of ML as well and it would be
> great if we get some idea on this from other experts on this group.
>
> I know we can reduce dimensions by using PCA, but i think that does not
> allow us to understand which factors from the original are we using in the
> end.
>
> - Tony L.
>
> On Mon, Aug 8, 2016 at 5:12 PM, Rohit Chaddha <rohitchaddha1...@gmail.com>
> wrote:
>
>>
>> I have a data-set where each data-point has 112 factors.
>>
>> I want to remove the factors which are not relevant, and say reduce to 20
>> factors out of these 112 and then do clustering of data-points using these
>> 20 factors.
>>
>> How do I do these and how do I figure out which of the 20 factors are
>> useful for analysis.
>>
>> I see SVD and PCA implementations, but I am not sure if these give which
>> elements are removed and which are remaining.
>>
>> Can someone please help me understand what to do here
>>
>> thanks,
>> -Rohit
>>
>>
>
>

Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

Reply via email to