On Wed, Nov 12, 2014 at 1:44 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

>
>
> On Wed, Nov 12, 2014 at 1:27 PM, Gokhan Capan <gkhn...@gmail.com> wrote:
>
>> My only concern is to add certain loss minimization tools for people to
>> write machine learning algorithms.
>>
>> mapBlock as you suggested can work equally, but I happened to have
>> implemented the aggregate op while thinking.
>>
>> Apart from this SGD implementation,
>> blockify-a-matrix-and-run-an-operation-in-parallel-on-blocks is, I
>> believe,
>> certainly required, since block level parallelization is really common in
>> matrix computations. Plus, if we are to add, say, a descriptive statistics
>> package, that would require a similar functionality, too.
>>
>> If mapBlocks for passing custom operators was more flexible, I'd be more
>> than happy, but I understand the idea behind its requirement of mapping
>> should be block-to-block with the same row size.
>>
>> Could you give a little more detail on the 'common distributed strategy'
>> idea?
>>
>
the idea is simple.

First, not use logical plan construction. In practice it means that while
say "A.%*%(B)" create a logical plan element (which is subsequently run
thru optimizer), something like aggregate(..) does not do that. Instead, it
just produces ... whatever it produces, directly. So it doesn't form any
new logical nor physical plan.

Second, it means that we can define internal strategy trait, something like
DistributedOperations, which will include this set of operations.
Subsequently, we will define native implementations of this trait in the
same way we defined some native stuff for DistributedEngine trait. (but
don't make it part of DistributedEngine trait please -- maybe an attribute
perhaps). At run time we will have to ask current engine to provide
distributed operation implementation and delegate execution of common
fragments to it .

Reply via email to