Jenkins build became unstable: mahout-nightly ยป Mahout Spark bindings #1743

2014-11-13 Thread Apache Jenkins Server
See 




Jenkins build became unstable: mahout-nightly #1743

2014-11-13 Thread Apache Jenkins Server
See 



Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-13 Thread Gokhan Capan
Awesome.

So we are going to implement certain required DistributedOperations, in a
separate trait similar to, but other than the DistributedEngine.

I'll think about this a little more, and propose an initial implementation
that hopefully we can agree on.

Best,

Gokhan

On Thu, Nov 13, 2014 at 1:35 AM, Dmitriy Lyubimov  wrote:

> On Wed, Nov 12, 2014 at 1:44 PM, Dmitriy Lyubimov 
> wrote:
>
> >
> >
> > On Wed, Nov 12, 2014 at 1:27 PM, Gokhan Capan  wrote:
> >
> >> My only concern is to add certain loss minimization tools for people to
> >> write machine learning algorithms.
> >>
> >> mapBlock as you suggested can work equally, but I happened to have
> >> implemented the aggregate op while thinking.
> >>
> >> Apart from this SGD implementation,
> >> blockify-a-matrix-and-run-an-operation-in-parallel-on-blocks is, I
> >> believe,
> >> certainly required, since block level parallelization is really common
> in
> >> matrix computations. Plus, if we are to add, say, a descriptive
> statistics
> >> package, that would require a similar functionality, too.
> >>
> >> If mapBlocks for passing custom operators was more flexible, I'd be more
> >> than happy, but I understand the idea behind its requirement of mapping
> >> should be block-to-block with the same row size.
> >>
> >> Could you give a little more detail on the 'common distributed strategy'
> >> idea?
> >>
> >
> the idea is simple.
>
> First, not use logical plan construction. In practice it means that while
> say "A.%*%(B)" create a logical plan element (which is subsequently run
> thru optimizer), something like aggregate(..) does not do that. Instead, it
> just produces ... whatever it produces, directly. So it doesn't form any
> new logical nor physical plan.
>
> Second, it means that we can define internal strategy trait, something like
> DistributedOperations, which will include this set of operations.
> Subsequently, we will define native implementations of this trait in the
> same way we defined some native stuff for DistributedEngine trait. (but
> don't make it part of DistributedEngine trait please -- maybe an attribute
> perhaps). At run time we will have to ask current engine to provide
> distributed operation implementation and delegate execution of common
> fragments to it .
>