On Wed, Nov 12, 2014 at 1:44 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> > > On Wed, Nov 12, 2014 at 1:27 PM, Gokhan Capan <gkhn...@gmail.com> wrote: > >> My only concern is to add certain loss minimization tools for people to >> write machine learning algorithms. >> >> mapBlock as you suggested can work equally, but I happened to have >> implemented the aggregate op while thinking. >> >> Apart from this SGD implementation, >> blockify-a-matrix-and-run-an-operation-in-parallel-on-blocks is, I >> believe, >> certainly required, since block level parallelization is really common in >> matrix computations. Plus, if we are to add, say, a descriptive statistics >> package, that would require a similar functionality, too. >> >> If mapBlocks for passing custom operators was more flexible, I'd be more >> than happy, but I understand the idea behind its requirement of mapping >> should be block-to-block with the same row size. >> >> Could you give a little more detail on the 'common distributed strategy' >> idea? >> > the idea is simple. First, not use logical plan construction. In practice it means that while say "A.%*%(B)" create a logical plan element (which is subsequently run thru optimizer), something like aggregate(..) does not do that. Instead, it just produces ... whatever it produces, directly. So it doesn't form any new logical nor physical plan. Second, it means that we can define internal strategy trait, something like DistributedOperations, which will include this set of operations. Subsequently, we will define native implementations of this trait in the same way we defined some native stuff for DistributedEngine trait. (but don't make it part of DistributedEngine trait please -- maybe an attribute perhaps). At run time we will have to ask current engine to provide distributed operation implementation and delegate execution of common fragments to it .