Awesome. So we are going to implement certain required DistributedOperations, in a separate trait similar to, but other than the DistributedEngine.
I'll think about this a little more, and propose an initial implementation that hopefully we can agree on. Best, Gokhan On Thu, Nov 13, 2014 at 1:35 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > On Wed, Nov 12, 2014 at 1:44 PM, Dmitriy Lyubimov <dlie...@gmail.com> > wrote: > > > > > > > On Wed, Nov 12, 2014 at 1:27 PM, Gokhan Capan <gkhn...@gmail.com> wrote: > > > >> My only concern is to add certain loss minimization tools for people to > >> write machine learning algorithms. > >> > >> mapBlock as you suggested can work equally, but I happened to have > >> implemented the aggregate op while thinking. > >> > >> Apart from this SGD implementation, > >> blockify-a-matrix-and-run-an-operation-in-parallel-on-blocks is, I > >> believe, > >> certainly required, since block level parallelization is really common > in > >> matrix computations. Plus, if we are to add, say, a descriptive > statistics > >> package, that would require a similar functionality, too. > >> > >> If mapBlocks for passing custom operators was more flexible, I'd be more > >> than happy, but I understand the idea behind its requirement of mapping > >> should be block-to-block with the same row size. > >> > >> Could you give a little more detail on the 'common distributed strategy' > >> idea? > >> > > > the idea is simple. > > First, not use logical plan construction. In practice it means that while > say "A.%*%(B)" create a logical plan element (which is subsequently run > thru optimizer), something like aggregate(..) does not do that. Instead, it > just produces ... whatever it produces, directly. So it doesn't form any > new logical nor physical plan. > > Second, it means that we can define internal strategy trait, something like > DistributedOperations, which will include this set of operations. > Subsequently, we will define native implementations of this trait in the > same way we defined some native stuff for DistributedEngine trait. (but > don't make it part of DistributedEngine trait please -- maybe an attribute > perhaps). At run time we will have to ask current engine to provide > distributed operation implementation and delegate execution of common > fragments to it . >