David Kravitz created SPARK-29418: ------------------------------------- Summary: Mismatched indices between input and featureImportances is at best extremely confusing Key: SPARK-29418 URL: https://issues.apache.org/jira/browse/SPARK-29418 Project: Spark Issue Type: Bug Components: ML Affects Versions: 2.4.4 Environment: I'm on AWS but I presume this is happening everywhere. Reporter: David Kravitz
When you read in a "libsvm" file, it requires you to be one-based, so lines look like this: 37.0 1:1.0 2:2.75 But then when you finish something like RandomForestRegressor and look at feature importances, it is zero based. model.stages[-1].featureImportances SparseVector(144, \{0: 0.0292, 1: 0.0041} I guess you can add one to make them line up, but why force us to do that? Either accept zero-based lists on libsvm files (easiest) or have featureImportances output correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org