[ 
https://issues.apache.org/jira/browse/SPARK-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681730#comment-14681730
 ] 

Apache Spark commented on SPARK-9819:
-------------------------------------

User 'huitseeker' has created a pull request for this issue:
https://github.com/apache/spark/pull/8103

> reduceBy(KeyAnd)Window should specify which is the accumulator argument in 
> invReduceFunc
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-9819
>                 URL: https://issues.apache.org/jira/browse/SPARK-9819
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.5.0
>         Environment: All
>            Reporter: François Garillot
>              Labels: documentation, streaming
>
> {{reduceByWindow}} has an optional {{invReduceFunc}} argument which allows 
> the reduction to be performed incrementally.
> The incremental reduction [performed in 
> {{ReducedWindowedDStream}}|https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/ReducedWindowedDStream.scala#L157]
>  only depends on the reduction and its inverse function being associative (as 
> shown by the reduce applied to {{oldValues}}), but does not require those 
> functions to be commutative.
> In particular, if the inverse reduction is the non-commutative, but 
> associative substraction (e.g. what you're computing is a running sum), it's 
> necessary to know that the intermediate result (to be substracted from) is 
> the first argument of {{invReduceFunc}} and that the second argument is the 
> old value to substract.
> It's only in the commutative case that we don't care which is which.
> The Scaladoc for the various overloads of {{reduceByWindow}} should let the 
> user know which is the accumulator, and which is the old value. A concise, 
> unambiguous way to state this is to write an inversion law in the Scaladoc:
> {{invReduceFunc(reduceFunc(x, y), y) = x}}
> We should also remind the user that he should use associative reduction (& 
> inverse reduction) functions, since the computation makes that assumption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to