What I am saying is that for certain algorithms including both engine-specific (such as aggregation) and DSL stuff, what is the best way of handling them?
i) should we add the distributed operations to Mahout codebase as it is proposed in #62? ii) should we have [engine]-ml modules (like spark-bindings and h2o-bindings) where we can mix the DSL and engine-specific stuff? Picking i. has the advantage of writing an ML-algorithm once and then it can be run on alternative engines, but it requires wrapping/duplicating existing distributed operations. Picking ii. has the advantage of avoiding writing distributed operations, but since we're mixing the DSL and the engine-specific stuff, an ML-algorithm written for an engine would not be available for the others. I just wanted to hear some opinions. Gokhan On Thu, Feb 5, 2015 at 4:11 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > I took it Gokhan had objections himself, based on his comments. if we are > talking about #62. > > He also expressed concerns about computing GSGD but i suspect it can still > be algebraically computed. > > On Wed, Feb 4, 2015 at 5:52 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > > > BTW Ted and Andrew have both expressed interest in the distributed > > aggregation stuff. It sounds like we are agreeing that > > non-algebra—computation method type things can be engine specific. > > > > So does anyone have an objection to Gokhan pushing his PR? > > > > On Feb 4, 2015, at 2:20 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > > > On Wed, Feb 4, 2015 at 1:51 PM, Andrew Palumbo <ap....@outlook.com> > wrote: > > > > > > > > > > > > > > My thought was not to bring primitive engine specific aggregetors, > > > combiners, etc. into math-scala. > > > > > > > Yeah. +1. I would like to support that as an experiment, see where it > goes. > > Clearly some distributed use cases are simple enough while also pervasive > > enough. > > > > >