[ 
https://issues.apache.org/jira/browse/SPARK-19382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847683#comment-15847683
 ] 

Miao Wang commented on SPARK-19382:
-----------------------------------

[~josephkb] If I understand correctly, I think we have to create separate tests 
for SparseVector. For example, assert(model.numFeatures === 2) in test("linear 
svc: default params").
If it is the DenseVector case, each Vector is size 2, which determines 
model.numFeatures = summarizer.mean.size = n = instance.size =2.

However, if I create a SparseVector of size 20 with non-zero values the same as 
the DenseVector (i.e., 2 non-zero values and 18 zero values), model.numFeatures 
= 20, based on the logic above.

Therefore, we should create separate test case for SparseVector, or we have to 
remove the test above.

test("linearSVC comparison with R e1071 and scikit-learn") also fails for all 
SparseVector case. 

Other tests pass for all  SparseVector case.

I am generating a mixed test now. 




> Test sparse vectors in LinearSVCSuite
> -------------------------------------
>
>                 Key: SPARK-19382
>                 URL: https://issues.apache.org/jira/browse/SPARK-19382
>             Project: Spark
>          Issue Type: Test
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Priority: Minor
>
> Currently, LinearSVCSuite does not test sparse vectors.  We should.  I 
> recommend that generateSVMInput be modified to create a mix of dense and 
> sparse vectors, rather than adding an additional test.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to