I also have some spark cooccurrence analysis code lying around that
might be a nice contribution.

On 07.01.2014 23:44, Dmitriy Lyubimov wrote:
> if you want to contribute to Mahout, obviously you want to speak to Mahout 
> dev audience. Spark is not yet officially integrated into Mahout, but we
> are actively contemplating it and I have been doing some work off SVN e.g.
> https://issues.apache.org/jira/browse/MAHOUT-1346,
> https://issues.apache.org/jira/browse/MAHOUT-1365 and some other algorithm
> ports.
> 
> 
> On Tue, Jan 7, 2014 at 1:30 PM, Oleksandr Olgashko <[email protected]
>> wrote:
> 
>> Didn't work with Spark before (just read their overview page).
>> Should i ask arising questions here or better switch to Spark's mailing
>> lists?
>>
>>
>> 2014/1/7 Sebastian Schelter <[email protected]>
>>
>>> IIRC that papers talks about MapReduce on a shared-memory system, not on
>>> a shared-nothing system such as the Hadoop implementation.
>>>
>>> As a rule of thumb, iterations in Hadoop are about 10x slower than in
>>> systems such as Giraph, Spark or Stratosphere.
>>>
>>> --sebastian
>>>
>>> On 07.01.2014 22:01, Oleksandr Olgashko wrote:
>>>> What can you say about
>>>>
>>>
>> http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf
>>> ?
>>>>
>>>>
>>>> 2014/1/7 Dmitriy Lyubimov <[email protected]>
>>>>
>>>>> yes. Create working notes how exactly to do that.  (Or, what i am a
>> bit
>>>>> pushing you towards, Spark, since MR is not really iteration friendly
>>>>> platform and it looks like iterations are needed in fastICA.).
>>>>>
>>>>>
>>>>> On Tue, Jan 7, 2014 at 12:38 PM, Oleksandr Olgashko <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> So the problem is to adapt ICA for MR, am i right?
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014/1/7 Dmitriy Lyubimov <[email protected]>
>>>>>>
>>>>>>> i already looked at fast ICA. while it claims to be parallel, this
>>> work
>>>>>>> doesn't exactly map it into map reduce (or spark) paradigm and from
>>>>> what
>>>>>> i
>>>>>>> can recollect still implies outer iterations for fitting principal
>>>>>>> component vectors one by one. Which means it probably already is
>>>>>>> MR-unfriendly by construction; Spark may show far better promise
>> here
>>>>> but
>>>>>>> still a working notes document is required to show how exactly.
>> that's
>>>>>> what
>>>>>>> i mean.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 7, 2014 at 1:35 AM, Oleksandr Olgashko <
>>>>>>> [email protected]
>>>>>>>> wrote:
>>>>>>>
>>>>>>>> Could you please take a look on this article?
>>>>>>>> http://cran.r-project.org/web/packages/fastICA/fastICA.pdf
>>>>>>>> I have learned that re-inventing the wheel is wrong for most
>>>>> problems,
>>>>>>> and
>>>>>>>> usually exists a better solution. However, it often needs some
>>>>>>> "grinding",
>>>>>>>> so I may research those ways, in case of approval.
>>>>>>>>
>>>>>>>> About Scala: unfortunately, I have never worked with this language
>>>>>>> before,
>>>>>>>> but wanted to. I'd like to fill that gap in my skills, but I don't
>>>>> know
>>>>>>>> exactly where to start.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014/1/7 Dmitriy Lyubimov <[email protected]>
>>>>>>>>
>>>>>>>>> ICA is a very useful technique for dimensionality reduction. I
>>>>>> believe
>>>>>>>>> Mahout would benefit from it; however challenges are fairly
>>>>>> significant
>>>>>>>> in
>>>>>>>>> terms of proven parallelization technique and acceptable efficacy,
>>>>>>> which
>>>>>>>>> makes it hard to just "implement" (I am not familiar at this point
>>>>>> with
>>>>>>>> any
>>>>>>>>> concrete work on parallel ICA). So like i said before i am not
>> very
>>>>>>>>> hopeful. However, if one never tries, then nothing will get ever
>>>>>> done.
>>>>>>>> who
>>>>>>>>> knows.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jan 6, 2014 at 2:18 PM, Isabel Drost-Fromm <
>>>>>> [email protected]
>>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On Mon, Jan 06, 2014 at 10:40:45PM +0200, Oleksandr Olgashko
>>>>> wrote:
>>>>>>>>>>> Returning back to question about theme to work, asked 2 months
>>>>>> ago.
>>>>>>>>>>> What algorithm should I implement?
>>>>>>>>>>
>>>>>>>>>> To be quite frank with you: None. Personally I'd rather see
>>>>>>>> improvements
>>>>>>>>>> (in terms of documentation, integration, stableisation,
>>>>> performance
>>>>>>>>>> optimisation) of the existing Mahout source.
>>>>>>>>>>
>>>>>>>>>> Feel free to take a closer look at the thread concerning "getting
>>>>>>>>>> involved" that we had around Christmas last year for inspiration.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Isabel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 

Reply via email to