I also have some spark cooccurrence analysis code lying around that might be a nice contribution.
On 07.01.2014 23:44, Dmitriy Lyubimov wrote: > if you want to contribute to Mahout, obviously you want to speak to Mahout > dev audience. Spark is not yet officially integrated into Mahout, but we > are actively contemplating it and I have been doing some work off SVN e.g. > https://issues.apache.org/jira/browse/MAHOUT-1346, > https://issues.apache.org/jira/browse/MAHOUT-1365 and some other algorithm > ports. > > > On Tue, Jan 7, 2014 at 1:30 PM, Oleksandr Olgashko <[email protected] >> wrote: > >> Didn't work with Spark before (just read their overview page). >> Should i ask arising questions here or better switch to Spark's mailing >> lists? >> >> >> 2014/1/7 Sebastian Schelter <[email protected]> >> >>> IIRC that papers talks about MapReduce on a shared-memory system, not on >>> a shared-nothing system such as the Hadoop implementation. >>> >>> As a rule of thumb, iterations in Hadoop are about 10x slower than in >>> systems such as Giraph, Spark or Stratosphere. >>> >>> --sebastian >>> >>> On 07.01.2014 22:01, Oleksandr Olgashko wrote: >>>> What can you say about >>>> >>> >> http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf >>> ? >>>> >>>> >>>> 2014/1/7 Dmitriy Lyubimov <[email protected]> >>>> >>>>> yes. Create working notes how exactly to do that. (Or, what i am a >> bit >>>>> pushing you towards, Spark, since MR is not really iteration friendly >>>>> platform and it looks like iterations are needed in fastICA.). >>>>> >>>>> >>>>> On Tue, Jan 7, 2014 at 12:38 PM, Oleksandr Olgashko < >>>>> [email protected]> wrote: >>>>> >>>>>> So the problem is to adapt ICA for MR, am i right? >>>>>> >>>>>> >>>>>> >>>>>> 2014/1/7 Dmitriy Lyubimov <[email protected]> >>>>>> >>>>>>> i already looked at fast ICA. while it claims to be parallel, this >>> work >>>>>>> doesn't exactly map it into map reduce (or spark) paradigm and from >>>>> what >>>>>> i >>>>>>> can recollect still implies outer iterations for fitting principal >>>>>>> component vectors one by one. Which means it probably already is >>>>>>> MR-unfriendly by construction; Spark may show far better promise >> here >>>>> but >>>>>>> still a working notes document is required to show how exactly. >> that's >>>>>> what >>>>>>> i mean. >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 7, 2014 at 1:35 AM, Oleksandr Olgashko < >>>>>>> [email protected] >>>>>>>> wrote: >>>>>>> >>>>>>>> Could you please take a look on this article? >>>>>>>> http://cran.r-project.org/web/packages/fastICA/fastICA.pdf >>>>>>>> I have learned that re-inventing the wheel is wrong for most >>>>> problems, >>>>>>> and >>>>>>>> usually exists a better solution. However, it often needs some >>>>>>> "grinding", >>>>>>>> so I may research those ways, in case of approval. >>>>>>>> >>>>>>>> About Scala: unfortunately, I have never worked with this language >>>>>>> before, >>>>>>>> but wanted to. I'd like to fill that gap in my skills, but I don't >>>>> know >>>>>>>> exactly where to start. >>>>>>>> >>>>>>>> >>>>>>>> 2014/1/7 Dmitriy Lyubimov <[email protected]> >>>>>>>> >>>>>>>>> ICA is a very useful technique for dimensionality reduction. I >>>>>> believe >>>>>>>>> Mahout would benefit from it; however challenges are fairly >>>>>> significant >>>>>>>> in >>>>>>>>> terms of proven parallelization technique and acceptable efficacy, >>>>>>> which >>>>>>>>> makes it hard to just "implement" (I am not familiar at this point >>>>>> with >>>>>>>> any >>>>>>>>> concrete work on parallel ICA). So like i said before i am not >> very >>>>>>>>> hopeful. However, if one never tries, then nothing will get ever >>>>>> done. >>>>>>>> who >>>>>>>>> knows. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Jan 6, 2014 at 2:18 PM, Isabel Drost-Fromm < >>>>>> [email protected] >>>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On Mon, Jan 06, 2014 at 10:40:45PM +0200, Oleksandr Olgashko >>>>> wrote: >>>>>>>>>>> Returning back to question about theme to work, asked 2 months >>>>>> ago. >>>>>>>>>>> What algorithm should I implement? >>>>>>>>>> >>>>>>>>>> To be quite frank with you: None. Personally I'd rather see >>>>>>>> improvements >>>>>>>>>> (in terms of documentation, integration, stableisation, >>>>> performance >>>>>>>>>> optimisation) of the existing Mahout source. >>>>>>>>>> >>>>>>>>>> Feel free to take a closer look at the thread concerning "getting >>>>>>>>>> involved" that we had around Christmas last year for inspiration. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Isabel >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >> >
