On Wed, Nov 27, 2013 at 10:17 AM, Dmitriy Lyubimov <[email protected]>wrote:
> > > > On Wed, Nov 27, 2013 at 9:09 AM, Oleksandr Olgashko < > [email protected]> wrote: > >> Could you please formalize reqs for ICA? I mean, what actually should be >> done. >> Parallelization strategy is a bit general concept. >> > > No, it is not really. Not general enough so that you couldn't do it on > your own. > > You can think of it as a fairly free-style TDD for how to do things on MR > or Pregel so the majority of reviewers here could understand. > I guess i need to be a bit more specific: Hadoop MR or Spark/Bagel apis . we don't really pull in any other frameworks at the moment. > Not ideal example but hope it helps --look at the attachment for > https://issues.apache.org/jira/browse/MAHOUT-1365 > > -d > > >> >> 2013/11/26 Dmitriy Lyubimov <[email protected]> >> >> > On Tue, Nov 26, 2013 at 1:11 PM, Олександр Ольгашко < >> > [email protected]> wrote: >> > >> > > I may need unknown period of time to get familiar with Mahout project >> > > structure. >> > > I'd like to make some research about ICA's parallelization strategy, >> it >> > is >> > > quite interesting. >> > > Not sure, if i can help somehow with MAHOUT-1346, never worked with >> such >> > > things before. >> > > >> > > Should i use mail lists or smth else for arising questions and other >> > > communication? >> > > >> > yes. there's probably no better place as far as Mahout is concerned. >> > >> > > >> > > >> > > 2013/11/26 Dmitriy Lyubimov <[email protected]> >> > > >> > > > Dimension reduction is addressed with PCA which is an option of SSVD >> > > > method. >> > > > However, if you can research/offer parallelization strategy for ICA, >> > i'd >> > > be >> > > > all ears. >> > > > >> > > > there's also ongoing push to create a DSL environment for mahout >> > > > distributed matrices to Spark which i personally think is one of the >> > most >> > > > promising recent developments. It is also an treasure chest (or a >> can >> > of >> > > > worms depending on how you view it) for new people to chime in. DSL >> > > > environment issue is MAHOUT-1346, with everything else pretty much >> > > > dependent on it >> > > > >> > > > -d >> > > > >> > > > >> > > > >> > > > >> > > > On Tue, Nov 26, 2013 at 9:19 AM, Олександр Ольгашко < >> > > > [email protected]> wrote: >> > > > >> > > > > Hello, >> > > > > >> > > > > I am a student, interested in data analysis, also i have chosen >> this >> > > > theme >> > > > > for my diploma work. As mentioned here >> > > > > https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms, >> there >> > > are >> > > > > some open algorithms, for example, in Dimension reduction section. >> > > > > >> > > > > So, how can i start develop them? I have some theoretical >> background, >> > > > but i >> > > > > think, there may be some unknown problems. Mb somebody is working >> on >> > > > these >> > > > > algorithms. Can you give some tips for start? >> > > > > >> > > > > I searched in JIRA for Independent Component Analysis, found >> nothing. >> > > > > >> > > > > Thanks in advance. >> > > > > >> > > > >> > > >> > >> > >
