[ https://issues.apache.org/jira/browse/SPARK-19747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885298#comment-15885298 ]
Nick Pentreath commented on SPARK-19747: ---------------------------------------- Big +1 for this! I agree we really should be able to make all the concrete implementations simply specify the specific aggregation part - effectively the loss. The general approach sounds good to me. > Consolidate code in ML aggregators > ---------------------------------- > > Key: SPARK-19747 > URL: https://issues.apache.org/jira/browse/SPARK-19747 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.2.0 > Reporter: Seth Hendrickson > Priority: Minor > > Many algorithms in Spark ML are posed as optimization of a differentiable > loss function over a parameter vector. We implement these by having a loss > function accumulate the gradient using an Aggregator class which has methods > that amount to a {{seqOp}} and {{combOp}}. So, pretty much every algorithm > that obeys this form implements a cost function class and an aggregator > class, which are completely separate from one another but share probably 80% > of the same code. > I think it is important to clean things like this up, and if we can do it > properly it will make the code much more maintainable, readable, and bug > free. It will also help reduce the overhead of future implementations. > The design is of course open for discussion, but I think we should aim to: > 1. Have all aggregators share parent classes, so that they only need to > implement the {{add}} function. This is really the only difference in the > current aggregators. > 2. Have a single, generic cost function that is parameterized by the > aggregator type. This reduces the many places we implement cost functions and > greatly reduces the amount of duplicated code. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org