Hi All,

Hope you are doing good.

We are using Spark MLLIB (1.4.1) PCA functionality for
dimensionality reduction.

So far we are able to condense n features into k features using
https://spark.apache.org/docs/1.4.1/mllib-dimensionality-reduction.html#principal-component-analysis-pca

The requirements, as per our data scientist , are as follows

a) We need to find Varimax Rotation (
https://en.wikipedia.org/wiki/Varimax_rotation) for the data - I could not
find anything on this in the documentation. Would anybody please help with
this.

b) We also need to find out what all features are getting clubbed together
so that we can understand the feature condensation that is taking place. Is
there a way to see this? E.g. we have a CSV with header as feature names,
the header is dropped but preserved for later use. We move from n features
to k in PCA i.e. n columns to k. What are those k columns made up of? How
to find this out?

c) Is there a way to preserve the primary key of the row that is getting
dropped in the analysis. i.e. when preparing the feature vector the PK is
dropped. (General Knowledge question :-))

Any help is appreciated.

Thanks in Advance,
~BA

Reply via email to