[GitHub] spark pull request: Bug fix of sparse vector conversion

2014-05-15 Thread funes
Github user funes commented on the pull request:

https://github.com/apache/spark/pull/661#issuecomment-42516407
  
@mengxr Updated the test case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Bug fix of sparse vector conversion

2014-05-15 Thread funes
Github user funes commented on a diff in the pull request:

https://github.com/apache/spark/pull/661#discussion_r12413127
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/BreezeVectorConversionSuite.scala
 ---
@@ -55,4 +55,16 @@ class BreezeVectorConversionSuite extends FunSuite {
 assert(vec.indices.eq(indices), "should not copy data")
 assert(vec.values.eq(values), "should not copy data")
   }
+
+  test("sparse breeze by vector builder to vector") {
--- End diff --

@mengxr There always has activeSize < index.length in Breeze 
SparseVector. I don't see the problem. Do you mean we also check it in the test 
case? 
btw, maybe we could keep a same  in our SparseVector? That 
would avoid the data copying issue in conversion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Bug fix of sparse vector conversion

2014-05-15 Thread funes
Github user funes commented on the pull request:

https://github.com/apache/spark/pull/661#issuecomment-42387458
  
@mengxr Just added a test for sparse breeze by vector builder. Original 
sparse breeze to vector test is still valid since no data copying should happen 
in that case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Bug fix of sparse vector conversion

2014-05-11 Thread funes
Github user funes commented on the pull request:

https://github.com/apache/spark/pull/661#issuecomment-42628022
  
Sorry that I didn't see this message earlier. I'd be glad to create a JIRA
if it is still appropriate. (I have to say I haven't read the how to
contribute wiki carefully before.)


On Fri, May 9, 2014 at 12:55 AM, Xiangrui Meng 
wrote:

> LGTM. Do you mind creating a JIRA for this change?
>
> —
> Reply to this email directly or view it on 
GitHub<https://github.com/apache/spark/pull/661#issuecomment-42575036>
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Bug fix of sparse vector conversion

2014-05-06 Thread funes
GitHub user funes reopened a pull request:

https://github.com/apache/spark/pull/661

Bug fix of sparse vector conversion

Fixed a small bug caused by the inconsistency of index/data array size and 
vector length.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/funes/spark bugfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/661.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #661


commit b85806c190f82bf9946e9e1f581d22b3c8afec19
Author: Funes 
Date:   2014-05-06T03:49:52Z

Bug fix of sparse vector conversion

commit 64e719843929a75208f18dbca599bb263449fb86
Author: Funes 
Date:   2014-05-06T09:26:15Z

Copy data only when necessary




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Bug fix of sparse vector conversion

2014-05-05 Thread funes
Github user funes commented on a diff in the pull request:

https://github.com/apache/spark/pull/661#discussion_r12311565
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -136,7 +136,7 @@ object Vectors {
   new DenseVector(v.toArray)  // Can't use underlying array 
directly, so make a new one
 }
   case v: BSV[Double] =>
-new SparseVector(v.length, v.index, v.data)
+new SparseVector(v.length, v.index.slice(0, v.used), 
v.data.slice(0, v.used))
--- End diff --

That makes sense. And I just found that it failed the test "Should not copy 
data".  SparseVector from VectorBuilder has this problem of size inconsistency. 
It uses a dynamic growing approach to resize the index/data array and fills the 
unused slot zeros. So I think there is no way to completely avoid new 
allocation. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Bug fix of sparse vector conversion

2014-05-05 Thread funes
Github user funes closed the pull request at:

https://github.com/apache/spark/pull/661


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Bug fix of sparse vector conversion

2014-05-05 Thread funes
GitHub user funes opened a pull request:

https://github.com/apache/spark/pull/661

Bug fix of sparse vector conversion

Fixed a small bug caused by the inconsistency of index/data array size and 
vector length.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/funes/spark bugfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/661.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #661


commit b85806c190f82bf9946e9e1f581d22b3c8afec19
Author: Funes 
Date:   2014-05-06T03:49:52Z

Bug fix of sparse vector conversion




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---