@Sebastian,
wanna post a link?

On Tue, Jan 7, 2014 at 2:46 PM, Sebastian Schelter <[email protected]> wrote:

> I also have some spark cooccurrence analysis code lying around that
> might be a nice contribution.
>
> On 07.01.2014 23:44, Dmitriy Lyubimov wrote:
> > if you want to contribute to Mahout, obviously you want to speak to
> Mahout
> > dev audience. Spark is not yet officially integrated into Mahout, but we
> > are actively contemplating it and I have been doing some work off SVN
> e.g.
> > https://issues.apache.org/jira/browse/MAHOUT-1346,
> > https://issues.apache.org/jira/browse/MAHOUT-1365 and some other
> algorithm
> > ports.
> >
> >
> > On Tue, Jan 7, 2014 at 1:30 PM, Oleksandr Olgashko <
> [email protected]
> >> wrote:
> >
> >> Didn't work with Spark before (just read their overview page).
> >> Should i ask arising questions here or better switch to Spark's mailing
> >> lists?
> >>
> >>
> >> 2014/1/7 Sebastian Schelter <[email protected]>
> >>
> >>> IIRC that papers talks about MapReduce on a shared-memory system, not
> on
> >>> a shared-nothing system such as the Hadoop implementation.
> >>>
> >>> As a rule of thumb, iterations in Hadoop are about 10x slower than in
> >>> systems such as Giraph, Spark or Stratosphere.
> >>>
> >>> --sebastian
> >>>
> >>> On 07.01.2014 22:01, Oleksandr Olgashko wrote:
> >>>> What can you say about
> >>>>
> >>>
> >>
> http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf
> >>> ?
> >>>>
> >>>>
> >>>> 2014/1/7 Dmitriy Lyubimov <[email protected]>
> >>>>
> >>>>> yes. Create working notes how exactly to do that.  (Or, what i am a
> >> bit
> >>>>> pushing you towards, Spark, since MR is not really iteration friendly
> >>>>> platform and it looks like iterations are needed in fastICA.).
> >>>>>
> >>>>>
> >>>>> On Tue, Jan 7, 2014 at 12:38 PM, Oleksandr Olgashko <
> >>>>> [email protected]> wrote:
> >>>>>
> >>>>>> So the problem is to adapt ICA for MR, am i right?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2014/1/7 Dmitriy Lyubimov <[email protected]>
> >>>>>>
> >>>>>>> i already looked at fast ICA. while it claims to be parallel, this
> >>> work
> >>>>>>> doesn't exactly map it into map reduce (or spark) paradigm and from
> >>>>> what
> >>>>>> i
> >>>>>>> can recollect still implies outer iterations for fitting principal
> >>>>>>> component vectors one by one. Which means it probably already is
> >>>>>>> MR-unfriendly by construction; Spark may show far better promise
> >> here
> >>>>> but
> >>>>>>> still a working notes document is required to show how exactly.
> >> that's
> >>>>>> what
> >>>>>>> i mean.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Jan 7, 2014 at 1:35 AM, Oleksandr Olgashko <
> >>>>>>> [email protected]
> >>>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Could you please take a look on this article?
> >>>>>>>> http://cran.r-project.org/web/packages/fastICA/fastICA.pdf
> >>>>>>>> I have learned that re-inventing the wheel is wrong for most
> >>>>> problems,
> >>>>>>> and
> >>>>>>>> usually exists a better solution. However, it often needs some
> >>>>>>> "grinding",
> >>>>>>>> so I may research those ways, in case of approval.
> >>>>>>>>
> >>>>>>>> About Scala: unfortunately, I have never worked with this language
> >>>>>>> before,
> >>>>>>>> but wanted to. I'd like to fill that gap in my skills, but I don't
> >>>>> know
> >>>>>>>> exactly where to start.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 2014/1/7 Dmitriy Lyubimov <[email protected]>
> >>>>>>>>
> >>>>>>>>> ICA is a very useful technique for dimensionality reduction. I
> >>>>>> believe
> >>>>>>>>> Mahout would benefit from it; however challenges are fairly
> >>>>>> significant
> >>>>>>>> in
> >>>>>>>>> terms of proven parallelization technique and acceptable
> efficacy,
> >>>>>>> which
> >>>>>>>>> makes it hard to just "implement" (I am not familiar at this
> point
> >>>>>> with
> >>>>>>>> any
> >>>>>>>>> concrete work on parallel ICA). So like i said before i am not
> >> very
> >>>>>>>>> hopeful. However, if one never tries, then nothing will get ever
> >>>>>> done.
> >>>>>>>> who
> >>>>>>>>> knows.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Jan 6, 2014 at 2:18 PM, Isabel Drost-Fromm <
> >>>>>> [email protected]
> >>>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> On Mon, Jan 06, 2014 at 10:40:45PM +0200, Oleksandr Olgashko
> >>>>> wrote:
> >>>>>>>>>>> Returning back to question about theme to work, asked 2 months
> >>>>>> ago.
> >>>>>>>>>>> What algorithm should I implement?
> >>>>>>>>>>
> >>>>>>>>>> To be quite frank with you: None. Personally I'd rather see
> >>>>>>>> improvements
> >>>>>>>>>> (in terms of documentation, integration, stableisation,
> >>>>> performance
> >>>>>>>>>> optimisation) of the existing Mahout source.
> >>>>>>>>>>
> >>>>>>>>>> Feel free to take a closer look at the thread concerning
> "getting
> >>>>>>>>>> involved" that we had around Christmas last year for
> inspiration.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Isabel
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
>
>

Reply via email to