[ https://issues.apache.org/jira/browse/SPARK-12844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jagadeesan A S updated SPARK-12844: ----------------------------------- Comment: was deleted (was: Started working on this.) > Spark documentation should be more precise about the algebraic properties of > functions in various transformations > ----------------------------------------------------------------------------------------------------------------- > > Key: SPARK-12844 > URL: https://issues.apache.org/jira/browse/SPARK-12844 > Project: Spark > Issue Type: Documentation > Components: Documentation > Reporter: Jimmy Lin > Priority: Minor > > Spark documentation should be more precise about the algebraic properties of > functions in various transformations. The way the current documentation is > written is potentially confusing. For example, in Spark 1.6, the scaladoc for > reduce in RDD says: > > Reduces the elements of this RDD using the specified commutative and > > associative binary operator. > This is precise and accurate. In the documentation of reduceByKey in > PairRDDFunctions, on the other hand, it says: > > Merge the values for each key using an associative reduce function. > To be more precise, this function must also be commutative in order for the > computation to be correct. Writing commutative for reduce and not reduceByKey > gives the false impression that the function in the latter does not need to > be commutative. > The same applies to aggregateByKey. To be precise, both seqOp and combOp need > to be associative (mentioned) AND commutative (not mentioned) in order for > the computation to be correct. It would be desirable to fix these > inconsistencies throughout the documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org