[ https://issues.apache.org/jira/browse/MAHOUT-364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855741#action_12855741 ]
Zoran Sevarac commented on MAHOUT-364: -------------------------------------- Hi, I'm Zoran Sevarac the Neuroph project http://neuroph.sourceforge.net founder and developer and I must say that I'm very happy to see this proposal. I agree that proposal is very well written and I like the idea very much. Although I dont have any experience with Hadoop, I'll be very glad to help with neural network related stuff. So I can say that the Neuroph and me personally will support and help with the development of this project if it gets accepted. I allready published short article about this http://netbeans.dzone.com/neuroph-hadoop-nb > [GSOC] Proposal to implement Neural Network with backpropagation learning on > Hadoop > ----------------------------------------------------------------------------------- > > Key: MAHOUT-364 > URL: https://issues.apache.org/jira/browse/MAHOUT-364 > Project: Mahout > Issue Type: New Feature > Reporter: Zaid Md. Abdul Wahab Sheikh > > Proposal Title: Implement Multi-Layer Perceptrons with backpropagation > learning on Hadoop (addresses issue Mahout-342) > Student Name: Zaid Md. Abdul Wahab Sheikh > Student E-mail: (gmail id) sheikh.zaid > h2. I. Brief Description > A feedforward neural network (NN) reveals several degrees of parallelism > within it such as weight parallelism, node parallelism, network parallelism, > layer parallelism and training parallelism. However network based parallelism > requires fine-grained synchronization and communication and thus is not > suitable for map/reduce based algorithms. On the other hand, training-set > parallelism is coarse grained. This can be easily exploited on Hadoop which > can split up the input among different mappers. Each of the mappers will then > propagate the 'InputSplit' through their own copy of the complete neural > network. > The backpropagation algorithm will operate in batch mode. This is because > updating a common set of parameters after each training example creates a > bottleneck for parallelization. The overall error gradient vector calculation > can be parallelized by calculating the gradients from each training vector in > the Mapper, combining them to get partial batch gradients and then adding > them in a reducer to get the overall batch gradient. > In a similiar manner, error function evaluations during line searches (for > the conjugate gradient and quasi-Newton algorithms) can be efficiently > parallelized. > Lastly, to avoid local minima in its error function, we can take advantage of > training session parallelism to start multiple training sessions in parallel > with different initial weights (simulated annealing). > h2. II. Detailed Proposal > The most important step is to design the base neural network classes in such > a way that other NN architectures like Hopfield nets, Boltzman machines, SOM > etc can be easily implemented by deriving from these base classes. For that I > propose to implement a set of core classes that correspond to basic neural > network concepts like artificial neuron, neuron layer, neuron connections, > weight, transfer function, input function, learning rule etc. This > architecture is inspired from that of the opensource Neuroph neural network > framework (http://imgur.com/gDIOe.jpg). This design of the base architecture > allows for great flexibility in deriving newer NNs and learning rules. All > that needs to be done is to derive from the NeuralNetwork class, provide the > method for network creation, create a new training method by deriving from > LearningRule, and then add that learning rule to the network during creation. > In addition, the API is very intuitive and easy to understand (in comparision > to other NN frameworks like Encog and JOONE). > h3. The approach to parallelization in Hadoop: > In the Driver class: > - The input parameters are read and the NeuralNetwork with a specified > LearningRule (training algorithm) created. > - Initial weight values are randomly generated and written to the FileSystem. > If number of training sessions (for simulated annealing) is specified, > multiple sets of initial weight values are generated. > - Training is started by calling the NeuralNetwork's learn() method. For each > iteration, every time the error gradient vector needs to be calculated, the > method submits a Job where the input path to the training-set vectors and > various key properties (like path to the stored weight values) are set. The > gradient vectors calculated by the Reducers are written back to an output > path in the FileSystem. > - After the JobClient.runJob() returns, the gradient vectors are retrieved > from the FileSystem and tested to see if the stopping criterion is satisfied. > The weights are then updated, using the method implemented by the particular > LearningRule. For line searches, each error function evaluation is again done > by submitting a job. > - The NN is trained in iterations until it converges. > In the Mapper class: > - Each Mapper is initialized using the configure method, the weights are > retrieved and the complete NeuralNetwork created. > - The map function then takes in the training vectors as key/value pairs (the > key is ignored), runs them through the NN to calculate the outputs and > backpropagates the errors to find out the error gradients. The error gradient > vectors are then output as key/value pairs where all the keys are set to a > common value, such as the training session number (for each training session, > all keys in the outputs of all the mappers have to be identical). > In the Combiner class: > - Iterates through the all individual error gradient vectors output by the > mapper (since they all have the same key) and adds them up to get a partial > batch gradient. > In the Reducer class: > - There's a single reducer class that will combine all the partial gradients > from the Mappers to get the overall batch gradient. > - The final error gradient vector is written back to the FileSystem > h3. I propose to complete all of the following sub-tasks during GSoC 2010: > Implementation of the Backpropagation algorithm: > - Initialization of weights: using the Nguyen-Widrow algorithm to select the > initial range of starting weight values. > - Input, transfer and error functions: implement basic input functions like > WeightedSum and transfer functions like Sigmoid, Gaussian, tanh and linear. > Implement the sum-of-squares error function. > - Optimization methods to update the weights: (a) Batch Gradient descent, > with momentum and a variable learning rate method [2] (b) A Conjugate > gradient method with Brent's line search. > Improving generalization: > - Validating the network to test for overfitting (Early stopping method) > - Regularization (weight decay method) > Create examples for: > - Classification: using the Abalone Data Set from UCI Machine Learning > Repository > - Classification, Regression: Breast Cancer Wisconsin (Prognostic) Data Set > If time permits, also implement: > - Resilient Backpropagation (RPROP) > h2. III. Week Plan with list of deliverables > * (Till May 23rd, community bonding period) > Brainstorm with my mentor and the Apache Mahout community to come up with the > most optimal design for an extensible Neural Network framework. Code > prototypes to identify potential problems and/or investigate new solutions. > Deliverable: A detailed report or design document on how to implement the > basic Neural Network framework and the learning algorithms. > * (May 24th, coding starts) Week 1: > Deliverables: Basic Neural network classes (Neuron, Connection, Weight, > Layer, LearningRule, NeuralNetwork etc) and the various input, transfer and > error functions mentioned previously. > * (May 31st) Week 2 and Week 3: > Deliverable: Driver, Mapper, Combiner and Reducer classes with basic > functonality to run a feedforward Neural Network on Hadoop (no training > methods yet, weights are generated using Nguyen-Widrow algorithm). > * (June 14th) Week 4: > Deliverable: Backpropagation algorithm using standard Batch Gradient descent. > > * (June 21st) Week 5: > Deliverables: Variable learning rate and momentum during Batch Gradient > descent. Validation tests support. Do some big tests. > * (June 28th) Week 6: > Deliverable: Support for Early stopping and Regularization (weight decay) > during training. > * (July 5th) Week 7 and Week 8: > Deliverable: Conjugate gradient method with Brent's line search algorithm. > * (July 19th) Week 9: > Deliverable: Write unit tests. Do bigger scale tests for both batch gradient > descent and conjugate gradient method. > * (July 26th) Week 10 and Week 11: > Deliverable: 2 examples of classification and regression on real-world > datasets from UCI Machine Learning Repository. More tests. > * (August 9th, tentative 'pencils down' date) Week 12: > Deliverable: Wind up the work. Scrub code. Improved documentation, tutorials > (on the wiki) etc. > * (August 16: Final evaluation) > h2. IV. Additional Information > I am a final year Computer Science student at NIT Allahabad (India) > graduating in May. For my final year project/thesis, I am working on Open > Domain Question Answering. I participated in GSoC last year for the Apertium > machine translation system > (http://google-opensource.blogspot.com/2009/11/apertium-projects-first-google-summer.html). > I am familiar with the three major opensource Neural Network frameworks in > Java, JOONE, Encog and Neuroph since I have used them in past projects on > fingerprint recognition and face recognition (during a summer course on image > and speech processing). My research interests are machine learning and > statistical natural language processing and I will be enrolling for a Ph.D. > next semester(i.e. next fall) in the same institute. > I have no specific time constraints throughout the GSoC period. I will devote > a minimum of 6 hours everyday to GSoC. > Time offset: UTC+5:30 (IST) > h2. V. References > [1] Fast parallel off-line training of multilayer perceptrons, S McLoone, GW > Irwin - IEEE Transactions on Neural Networks, 1997 > [2] Optimization of the backpropagation algorithm for training multilayer > perceptrons, W. Schiffmann, M. Joost and R. Werner, 1994 > [3] Map-Reduce for Machine Learning on Multicore, Cheng T. Chu, Sang K. Kim, > Yi A. Lin, et al - in NIPS, 2006 > [4] Neural networks for pattern recognition, CM Bishop - 1995 [BOOK] -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira