[GitHub] spark pull request: Bug fix of sparse vector conversion
Github user funes commented on the pull request: https://github.com/apache/spark/pull/661#issuecomment-42516407 @mengxr Updated the test case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Bug fix of sparse vector conversion
Github user funes commented on a diff in the pull request: https://github.com/apache/spark/pull/661#discussion_r12413127 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/BreezeVectorConversionSuite.scala --- @@ -55,4 +55,16 @@ class BreezeVectorConversionSuite extends FunSuite { assert(vec.indices.eq(indices), "should not copy data") assert(vec.values.eq(values), "should not copy data") } + + test("sparse breeze by vector builder to vector") { --- End diff -- @mengxr There always has activeSize < index.length in Breeze SparseVector. I don't see the problem. Do you mean we also check it in the test case? btw, maybe we could keep a same in our SparseVector? That would avoid the data copying issue in conversion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Bug fix of sparse vector conversion
Github user funes commented on the pull request: https://github.com/apache/spark/pull/661#issuecomment-42387458 @mengxr Just added a test for sparse breeze by vector builder. Original sparse breeze to vector test is still valid since no data copying should happen in that case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Bug fix of sparse vector conversion
Github user funes commented on the pull request: https://github.com/apache/spark/pull/661#issuecomment-42628022 Sorry that I didn't see this message earlier. I'd be glad to create a JIRA if it is still appropriate. (I have to say I haven't read the how to contribute wiki carefully before.) On Fri, May 9, 2014 at 12:55 AM, Xiangrui Meng wrote: > LGTM. Do you mind creating a JIRA for this change? > > â > Reply to this email directly or view it on GitHub<https://github.com/apache/spark/pull/661#issuecomment-42575036> > . > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Bug fix of sparse vector conversion
GitHub user funes reopened a pull request: https://github.com/apache/spark/pull/661 Bug fix of sparse vector conversion Fixed a small bug caused by the inconsistency of index/data array size and vector length. You can merge this pull request into a Git repository by running: $ git pull https://github.com/funes/spark bugfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/661.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #661 commit b85806c190f82bf9946e9e1f581d22b3c8afec19 Author: Funes Date: 2014-05-06T03:49:52Z Bug fix of sparse vector conversion commit 64e719843929a75208f18dbca599bb263449fb86 Author: Funes Date: 2014-05-06T09:26:15Z Copy data only when necessary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Bug fix of sparse vector conversion
Github user funes commented on a diff in the pull request: https://github.com/apache/spark/pull/661#discussion_r12311565 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -136,7 +136,7 @@ object Vectors { new DenseVector(v.toArray) // Can't use underlying array directly, so make a new one } case v: BSV[Double] => -new SparseVector(v.length, v.index, v.data) +new SparseVector(v.length, v.index.slice(0, v.used), v.data.slice(0, v.used)) --- End diff -- That makes sense. And I just found that it failed the test "Should not copy data". SparseVector from VectorBuilder has this problem of size inconsistency. It uses a dynamic growing approach to resize the index/data array and fills the unused slot zeros. So I think there is no way to completely avoid new allocation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Bug fix of sparse vector conversion
Github user funes closed the pull request at: https://github.com/apache/spark/pull/661 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Bug fix of sparse vector conversion
GitHub user funes opened a pull request: https://github.com/apache/spark/pull/661 Bug fix of sparse vector conversion Fixed a small bug caused by the inconsistency of index/data array size and vector length. You can merge this pull request into a Git repository by running: $ git pull https://github.com/funes/spark bugfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/661.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #661 commit b85806c190f82bf9946e9e1f581d22b3c8afec19 Author: Funes Date: 2014-05-06T03:49:52Z Bug fix of sparse vector conversion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---