I am debugging problems with a PySpark RandomForestClassificationModel, and I am trying to use the feature importances to do so. However, the featureImportances property returns a SparseVector that isn't possible to interpret. How can I transform the SparseVector to be a useful list of features along with feature type and name?
Some of my feature were nominal, so they had to be one-hot-encoded and then combined with my numeric features. There is no PCA or anything that would make interpretability hard, I just need to transform things back to where I can get a feature type/name for each item in the SparseVector. In other words... in practice, RandomForestClassificationModel.featureImportances isn't useful without some ability to make it interpretable. Does that ability exist? I've done this in sklearn, but don't know how to do this with Spark ML. My code is in a Jupyter Notebook on Github here <https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch09/Debugging%20Prediction%20Problems.ipynb>, skip to the end. Stack Overflow post: http://stackoverflow.com/questions/41273893/in-pyspark-ml-how-can-i-interpret-the-sparsevector-returned-by-a-pyspark-ml-cla Thanks! -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io