[ https://issues.apache.org/jira/browse/SPARK-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681729#comment-14681729 ]
François Garillot commented on SPARK-9819: ------------------------------------------ https://github.com/apache/spark/pull/8103 > reduceBy(KeyAnd)Window should specify which is the accumulator argument in > invReduceFunc > ---------------------------------------------------------------------------------------- > > Key: SPARK-9819 > URL: https://issues.apache.org/jira/browse/SPARK-9819 > Project: Spark > Issue Type: Bug > Components: Streaming > Affects Versions: 1.5.0 > Environment: All > Reporter: François Garillot > Labels: documentation, streaming > > {{reduceByWindow}} has an optional {{invReduceFunc}} argument which allows > the reduction to be performed incrementally. > The incremental reduction [performed in > {{ReducedWindowedDStream}}|https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/ReducedWindowedDStream.scala#L157] > only depends on the reduction and its inverse function being associative (as > shown by the reduce applied to {{oldValues}}), but does not require those > functions to be commutative. > In particular, if the inverse reduction is the non-commutative, but > associative substraction (e.g. what you're computing is a running sum), it's > necessary to know that the intermediate result (to be substracted from) is > the first argument of {{invReduceFunc}} and that the second argument is the > old value to substract. > It's only in the commutative case that we don't care which is which. > The Scaladoc for the various overloads of {{reduceByWindow}} should let the > user know which is the accumulator, and which is the old value. A concise, > unambiguous way to state this is to write an inversion law in the Scaladoc: > {{invReduceFunc(reduceFunc(x, y), y) = x}} > We should also remind the user that he should use associative reduction (& > inverse reduction) functions, since the computation makes that assumption. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org