[ https://issues.apache.org/jira/browse/SPARK-19382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847683#comment-15847683 ]
Miao Wang commented on SPARK-19382: ----------------------------------- [~josephkb] If I understand correctly, I think we have to create separate tests for SparseVector. For example, assert(model.numFeatures === 2) in test("linear svc: default params"). If it is the DenseVector case, each Vector is size 2, which determines model.numFeatures = summarizer.mean.size = n = instance.size =2. However, if I create a SparseVector of size 20 with non-zero values the same as the DenseVector (i.e., 2 non-zero values and 18 zero values), model.numFeatures = 20, based on the logic above. Therefore, we should create separate test case for SparseVector, or we have to remove the test above. test("linearSVC comparison with R e1071 and scikit-learn") also fails for all SparseVector case. Other tests pass for all SparseVector case. I am generating a mixed test now. > Test sparse vectors in LinearSVCSuite > ------------------------------------- > > Key: SPARK-19382 > URL: https://issues.apache.org/jira/browse/SPARK-19382 > Project: Spark > Issue Type: Test > Components: ML > Reporter: Joseph K. Bradley > Priority: Minor > > Currently, LinearSVCSuite does not test sparse vectors. We should. I > recommend that generateSVMInput be modified to create a mix of dense and > sparse vectors, rather than adding an additional test. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org