[ 
https://issues.apache.org/jira/browse/SPARK-12844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jagadeesan A S updated SPARK-12844:
-----------------------------------
    Comment: was deleted

(was: Started working on this.)

> Spark documentation should be more precise about the algebraic properties of 
> functions in various transformations
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-12844
>                 URL: https://issues.apache.org/jira/browse/SPARK-12844
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation
>            Reporter: Jimmy Lin
>            Priority: Minor
>
> Spark documentation should be more precise about the algebraic properties of 
> functions in various transformations. The way the current documentation is 
> written is potentially confusing. For example, in Spark 1.6, the scaladoc for 
> reduce in RDD says:
> > Reduces the elements of this RDD using the specified commutative and 
> > associative binary operator.
> This is precise and accurate. In the documentation of reduceByKey in 
> PairRDDFunctions, on the other hand, it says:
> > Merge the values for each key using an associative reduce function.
> To be more precise, this function must also be commutative in order for the 
> computation to be correct. Writing commutative for reduce and not reduceByKey 
> gives the false impression that the function in the latter does not need to 
> be commutative.
> The same applies to aggregateByKey. To be precise, both seqOp and combOp need 
> to be associative (mentioned) AND commutative (not mentioned) in order for 
> the computation to be correct. It would be desirable to fix these 
> inconsistencies throughout the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to