2015-06-05 17:49 GMT+02:00 Behroz Sikander <[email protected]>: > Hi, > *>>Please feel free to contribute documentation to the Apache Hama > wiki[1]!* > Ok. I am new to open source world so quite new to the procedure. Whenever I > will find something missing, I will edit it. > > *>>We also maybe work together on it but I have no idea yet. Custom > “Modern” or* > *“Classic” Style? Maven website again?* > Ok. I do not quite understand what do you mean by Modern or Classic style. > Does Apache provides some kind of CMS to manage the hosted project websites > ? > > *>>ADDM is quite interesting, and it looks like more fit into BSP than > MapReduce* > *(even if HBase(?) or memory-based shared storage is used). * > Yes ADMM seems to be a natural fit for BSP model because ADMM algorithms > are iterative. In each iteration, different machines process and exchange > data and the algorithm keep running unless a convergence criteria is met. > > Check out Chapter 10 (Page 78) of following ADMM paper: > https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf > > It discusses the implementation details of ADMM on BigData systems. > > *>>But I don't fully understand * > My understanding is also limited but if the cost function of ML algorithms > is Convex then the cost function can be converted to ADMM form. Once in > ADMM form we can run it on a distributed system like Hama. > > >>*and so don't know whether it can be used as abstraction layer of **many > ML algorithms. We'll need more investigation.* > > Yes, more investigation is needed. Here are a few ML algorithms already in > ADMM form (a,b,c). > > a) *L1 Linear Regression -* > https://www.dtc.umn.edu/s/resources/tsp2010oct-dlasso.pdf > b) *L2-Logistic Regression:* > > https://intentmedia.github.io/assets/2013-10-09-presenting-at-ieee-big-data/pld_js_ieee_bigdata_2013_admm.pdf > c) *SVM* - http://www.jmlr.org/papers/volume11/forero10a/forero10a.pdf > > I don't know ADMM myself but what you say sounds pretty much similar to how we implemented gradient descent and linear / logistic regression [1] on top of it. Any improvement there would be of course highly appreciated, so feel free to open Jira issues and attach patches accordingly.
Regards, Tommaso [1] : https://github.com/apache/hama/tree/trunk/ml/src/main/java/org/apache/hama/ml/regression > > Regards, > Behroz Sikander > > > > > On Fri, Jun 5, 2015 at 3:19 AM, Edward J. Yoon <[email protected]> > wrote: > > > Please feel free to contribute documentation to the Apache Hama wiki[1]! > > Ultimately, I'm considering improving our official website[2] on > HAMA-960. > > We > > also maybe work together on it but I have no idea yet. Custom “Modern” or > > “Classic” Style? Maven website again? > > > > ADDM is quite interesting, and it looks like more fit into BSP than > > MapReduce > > (even if HBase(?) or memory-based shared storage is used). But I don't > > fully > > understand and so don't know whether it can be used as abstraction layer > of > > many ML algorithms. We'll need more investigation. > > > > > > 1. https://wiki.apache.org/hama > > 2. https://hama.apache.org/ > > > > -- > > Best Regards, Edward J. Yoon > > > > -----Original Message----- > > From: Behroz Sikander [mailto:[email protected]] > > Sent: Thursday, June 04, 2015 10:24 PM > > To: [email protected] > > Subject: Re: [DISCUSS] Things I'd like to focus on next > > > > Hi, > > +1. > > Yes documentation needs improvement. I also saw that a book on Hama is > also > > under progress. I can help with the documentation. I only found the > > following open issuehttps://issues.apache.org/jira/browse/HAMA-960. > > > > Something like MLBase or Mahout on top of Hama would be really nice and > > will boost the project. Regarding machine learning algorithms can we use > > ADMM(a) to implement the algorithms ? > > Like https://issues.apache.org/jira/browse/SPARK-1543 > > > > a) https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf > > > > Regards, > > Behroz Sikander > > > > On Wed, Jun 3, 2015 at 9:48 AM, Edward J. Yoon <[email protected]> > > wrote: > > > > > Hey, > > > > > > Here's few things I'd like to focus on next. > > > > > > 1. Add stream input format for listening messages coming from 3rd > > > party applications, and incremental learning algorithms. > > > 2. Improve reliability of system e.g., fault tolerance, HA, ..., etc. > > > 3. More machine learning algorithms, such as ensemble classifier, SVM, > > > DNN, ..., etc > > > > > > Do you have any other suggestions? > > > > > > Thanks! > > > > > > -- > > > Best Regards, Edward J. Yoon > > > > > > > > > >
