[jira] [Comment Edited] (SPARK-17950) Match SparseVector behavior with DenseVector

2016-10-17 Thread AbderRahman Sobh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583877#comment-15583877
 ] 

AbderRahman Sobh edited comment on SPARK-17950 at 10/18/16 12:07 AM:
-

Yes, the full array needs to be expanded since the numpy functions potentially 
need to operate on every value in the array. There is room for another 
implementation that instead simply mimics the numpy functions (and their 
handles) and provides smarter implementations for solving means and such when 
using a SparseVector. If that is preferable, I can modify the code to do that 
instead.

Note also that the unpacked array is automatically cleared out after the call.


was (Author: itg-abby):
Yes, the full array needs to be expanded since the numpy functions potentially 
need to operate on every value in the array. There is room for another 
implementation that instead simply mimics the numpy functions (and their 
handles) and provides smarter implementations for solving means and such when 
using a SparseVector. If that is preferable, I can modify the code to do that 
instead.

> Match SparseVector behavior with DenseVector
> 
>
> Key: SPARK-17950
> URL: https://issues.apache.org/jira/browse/SPARK-17950
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Affects Versions: 2.0.1
>Reporter: AbderRahman Sobh
>Priority: Minor
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Simply added the `__getattr__` to SparseVector that DenseVector has, but 
> calls self.toArray() instead of storing a vector all the time in self.array
> This allows for use of numpy functions on the values of a SparseVector in the 
> same direct way that users interact with DenseVectors.
>  i.e. you can simply call SparseVector.mean() to average the values in the 
> entire vector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17950) Match SparseVector behavior with DenseVector

2016-10-17 Thread AbderRahman Sobh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583877#comment-15583877
 ] 

AbderRahman Sobh edited comment on SPARK-17950 at 10/18/16 12:07 AM:
-

Yes, the full array needs to be expanded since the numpy functions potentially 
need to operate on every value in the array. There is room for another 
implementation that instead simply mimics the numpy functions (and their 
handles) and provides smarter implementations for solving means and such when 
using a SparseVector. If that is preferable, I can modify the code to do that 
instead.


was (Author: itg-abby):
Yes, the full array needs to be expanded since the numpy functions potentially 
need to operate on every value in the array. There is room for another 
implementation that instead simply mimics the numpy functions (and their 
handles) and provides smarter implementations for solving means and such when 
using a SparseVector. If that is preferable, I can modify the code to do that 
instead.

I also just realized that I am not 100% sure if the garbage collection works as 
I am expecting. My assumption was that Python would automatically clean up 
after using the array, but since it is technically inside of the object's magic 
method I cannot tell if it might need another line to explicitly clear the 
array out.

> Match SparseVector behavior with DenseVector
> 
>
> Key: SPARK-17950
> URL: https://issues.apache.org/jira/browse/SPARK-17950
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Affects Versions: 2.0.1
>Reporter: AbderRahman Sobh
>Priority: Minor
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Simply added the `__getattr__` to SparseVector that DenseVector has, but 
> calls self.toArray() instead of storing a vector all the time in self.array
> This allows for use of numpy functions on the values of a SparseVector in the 
> same direct way that users interact with DenseVectors.
>  i.e. you can simply call SparseVector.mean() to average the values in the 
> entire vector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17950) Match SparseVector behavior with DenseVector

2016-10-17 Thread AbderRahman Sobh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583877#comment-15583877
 ] 

AbderRahman Sobh edited comment on SPARK-17950 at 10/18/16 12:05 AM:
-

Yes, the full array needs to be expanded since the numpy functions potentially 
need to operate on every value in the array. There is room for another 
implementation that instead simply mimics the numpy functions (and their 
handles) and provides smarter implementations for solving means and such when 
using a SparseVector. If that is preferable, I can modify the code to do that 
instead.

I also just realized that I am not 100% sure if the garbage collection works as 
I am expecting. My assumption was that Python would automatically clean up 
after using the array, but since it is technically inside of the object's magic 
method I cannot tell if it might need another line to explicitly clear the 
array out.


was (Author: itg-abby):
Yes, the full array needs to be expanded since the numpy functions potentially 
need to operate on every value in the array. There is room for another 
implementation that instead simply mimics the numpy functions (and their 
handles) and provides smarter implementations for solving means and such when 
using a SparseVector. If that is preferable, I can modify the code to do that 
instead.

I also just realized that I am not 100% sure if the garbage collection works as 
I am expecting. My assumption was that Python would automatically clean up 
after using the array, but since it is technically inside of the object it 
might need another line to explicitly clear the array out?

> Match SparseVector behavior with DenseVector
> 
>
> Key: SPARK-17950
> URL: https://issues.apache.org/jira/browse/SPARK-17950
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Affects Versions: 2.0.1
>Reporter: AbderRahman Sobh
>Priority: Minor
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Simply added the `__getattr__` to SparseVector that DenseVector has, but 
> calls self.toArray() instead of storing a vector all the time in self.array
> This allows for use of numpy functions on the values of a SparseVector in the 
> same direct way that users interact with DenseVectors.
>  i.e. you can simply call SparseVector.mean() to average the values in the 
> entire vector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org