Re: foreachActive functionality

2015-01-25 Thread Reza Zadeh
The idea is to unify the code path for dense and sparse vector operations,
which makes the codebase easier to maintain. By handling (index, value)
tuples, you can let the foreachActive method take care of checking if the
vector is sparse or dense, and running a foreach over the values.

On Sun, Jan 25, 2015 at 8:18 AM, kundan kumar iitr.kun...@gmail.com wrote:

 Can someone help me to understand the usage of foreachActive  function
 introduced for the Vectors.

 I am trying to understand its usage in MultivariateOnlineSummarizer class
 for summary statistics.


 sample.foreachActive { (index, value) =
   if (value != 0.0) {
 if (currMax(index)  value) {
   currMax(index) = value
 }
 if (currMin(index)  value) {
   currMin(index) = value
 }

 val prevMean = currMean(index)
 val diff = value - prevMean
 currMean(index) = prevMean + diff / (nnz(index) + 1.0)
 currM2n(index) += (value - currMean(index)) * diff
 currM2(index) += value * value
 currL1(index) += math.abs(value)

 nnz(index) += 1.0
   }
 }

 Regards,
 Kundan





Re: foreachActive functionality

2015-01-25 Thread DB Tsai
PS, we were using Breeze's activeIterator originally as you can see in
the old code, but we found there are overhead there, so we implement
our own implementation which results 4x faster. See
https://github.com/apache/spark/pull/3288 for detail.

Sincerely,

DB Tsai
---
Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai



On Sun, Jan 25, 2015 at 12:25 PM, Reza Zadeh r...@databricks.com wrote:
 The idea is to unify the code path for dense and sparse vector operations,
 which makes the codebase easier to maintain. By handling (index, value)
 tuples, you can let the foreachActive method take care of checking if the
 vector is sparse or dense, and running a foreach over the values.

 On Sun, Jan 25, 2015 at 8:18 AM, kundan kumar iitr.kun...@gmail.com wrote:

 Can someone help me to understand the usage of foreachActive  function
 introduced for the Vectors.

 I am trying to understand its usage in MultivariateOnlineSummarizer class
 for summary statistics.


 sample.foreachActive { (index, value) =
   if (value != 0.0) {
 if (currMax(index)  value) {
   currMax(index) = value
 }
 if (currMin(index)  value) {
   currMin(index) = value
 }

 val prevMean = currMean(index)
 val diff = value - prevMean
 currMean(index) = prevMean + diff / (nnz(index) + 1.0)
 currM2n(index) += (value - currMean(index)) * diff
 currM2(index) += value * value
 currL1(index) += math.abs(value)

 nnz(index) += 1.0
   }
 }

 Regards,
 Kundan




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org