[jira] [Created] (SPARK-29418) Mismatched indices between input and featureImportances is at best extremely confusing

David Kravitz (Jira) Wed, 09 Oct 2019 12:27:28 -0700

David Kravitz created SPARK-29418:
-------------------------------------

             Summary: Mismatched indices between input and featureImportances 
is at best extremely confusing
                 Key: SPARK-29418
                 URL: https://issues.apache.org/jira/browse/SPARK-29418
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 2.4.4
         Environment: I'm on AWS but I presume this is happening everywhere.  
            Reporter: David Kravitz



When you read in a "libsvm" file, it requires you to be one-based, so lines 
look like this:

37.0 1:1.0 2:2.75

But then when you finish something like RandomForestRegressor and look at 
feature importances, it is zero based.  

model.stages[-1].featureImportances

SparseVector(144, \{0: 0.0292, 1: 0.0041}

I guess you can add one to make them line up, but why force us to do that?  
Either accept zero-based lists on libsvm files (easiest) or have 
featureImportances output correctly.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29418) Mismatched indices between input and featureImportances is at best extremely confusing

Reply via email to