[ 
https://issues.apache.org/jira/browse/MAHOUT-364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856698#action_12856698
 ] 

Ted Dunning commented on MAHOUT-364:
------------------------------------

{quote}
By the way is GPL3 Apache 2 compatible?
{quote}

No.

Apache licenses allow any kind of reuse with only the mildest of conditions.  
They are similar to creative commons licenses that allow re-use with 
attribution.

GPL licenses require that derived works also be GPL licensed and that source 
code be provide for all derived works.

See here: http://www.opensource.org/licenses/gpl-3.0.html  and here: 
http://www.opensource.org/licenses/apache2.0.php

> [GSOC] Proposal to implement Neural Network with backpropagation learning on 
> Hadoop
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-364
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-364
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Zaid Md. Abdul Wahab Sheikh
>
> Proposal Title: Implement Multi-Layer Perceptrons with backpropagation 
> learning on Hadoop (addresses issue Mahout-342)
> Student Name: Zaid Md. Abdul Wahab Sheikh
> Student E-mail: (gmail id) sheikh.zaid
> h2. I. Brief Description
> A feedforward neural network (NN) reveals several degrees of parallelism 
> within it such as weight parallelism, node parallelism, network parallelism, 
> layer parallelism and training parallelism. However network based parallelism 
> requires fine-grained synchronization and communication and thus is not 
> suitable for map/reduce based algorithms. On the other hand, training-set 
> parallelism is coarse grained. This can be easily exploited on Hadoop which 
> can split up the input among different mappers. Each of the mappers will then 
> propagate the 'InputSplit' through their own copy of the complete neural 
> network.
> The backpropagation algorithm will operate in batch mode. This is because 
> updating a common set of parameters after each training example creates a 
> bottleneck for parallelization. The overall error gradient vector calculation 
> can be parallelized by calculating the gradients from each training vector in 
> the Mapper, combining them to get partial batch gradients and then adding 
> them in a reducer to get the overall batch gradient.
> In a similiar manner, error function evaluations during line searches (for 
> the conjugate gradient and quasi-Newton algorithms) can be efficiently 
> parallelized.
> Lastly, to avoid local minima in its error function, we can take advantage of 
> training session parallelism to start multiple training sessions in parallel 
> with different initial weights (simulated annealing).
> h2. II. Detailed Proposal
> The most important step is to design the base neural network classes in such 
> a way that other NN architectures like Hopfield nets, Boltzman machines, SOM 
> etc can be easily implemented by deriving from these base classes. For that I 
> propose to implement a set of core classes that correspond to basic neural 
> network concepts like artificial neuron, neuron layer, neuron connections, 
> weight, transfer function, input function, learning rule etc. This 
> architecture is inspired from that of the opensource Neuroph neural network 
> framework (http://imgur.com/gDIOe.jpg). This design of the base architecture 
> allows for great flexibility in deriving newer NNs and learning rules. All 
> that needs to be done is to derive from the NeuralNetwork class, provide the 
> method for network creation, create a new training method by deriving from 
> LearningRule, and then add that learning rule to the network during creation. 
> In addition, the API is very intuitive and easy to understand (in comparision 
> to other NN frameworks like Encog and JOONE).
> h3. The approach to parallelization in Hadoop:
> In the Driver class:
> - The input parameters are read and the NeuralNetwork with a specified 
> LearningRule (training algorithm) created.
> - Initial weight values are randomly generated and written to the FileSystem. 
> If number of training sessions (for simulated annealing) is specified, 
> multiple sets of initial weight values are generated.
> - Training is started by calling the NeuralNetwork's learn() method. For each 
> iteration, every time the error gradient vector needs to be calculated, the 
> method submits a Job where the input path to the training-set vectors and 
> various key properties (like path to the stored weight values) are set. The 
> gradient vectors calculated by the Reducers are written back to an output 
> path in the FileSystem.
> - After the JobClient.runJob() returns, the gradient vectors are retrieved 
> from the FileSystem and tested to see if the stopping criterion is satisfied. 
> The weights are then updated, using the method implemented by the particular 
> LearningRule. For line searches, each error function evaluation is again done 
> by submitting a job.
> - The NN is trained in iterations until it converges.
> In the Mapper class:
> - Each Mapper is initialized using the configure method, the weights are 
> retrieved and the complete NeuralNetwork created.
> - The map function then takes in the training vectors as key/value pairs (the 
> key is ignored), runs them through the NN to calculate the outputs and 
> backpropagates the errors to find out the error gradients. The error gradient 
> vectors are then output as key/value pairs where all the keys are set to a 
> common value, such as the training session number (for each training session, 
> all keys in the outputs of all the mappers have to be identical).
> In the Combiner class:
> - Iterates through the all individual error gradient vectors output by the 
> mapper (since they all have the same key) and adds them up to get a partial 
> batch gradient.
> In the Reducer class:
> - There's a single reducer class that will combine all the partial gradients 
> from the Mappers to get the overall batch gradient.
> - The final error gradient vector is written back to the FileSystem
> h3. I propose to complete all of the following sub-tasks during GSoC 2010:
> Implementation of the Backpropagation algorithm:
> - Initialization of weights: using the Nguyen-Widrow algorithm to select the 
> initial range of starting weight values.
> - Input, transfer and error functions: implement basic input functions like 
> WeightedSum and transfer functions like Sigmoid, Gaussian, tanh and linear. 
> Implement the sum-of-squares error function.
> - Optimization methods to update the weights: (a) Batch Gradient descent, 
> with momentum and a variable learning rate method [2] (b) A Conjugate 
> gradient method with Brent's line search. 
> Improving generalization:
> - Validating the network to test for overfitting (Early stopping method)
> - Regularization (weight decay method)
> Create examples for:
> - Classification: using the Abalone Data Set  from UCI Machine Learning 
> Repository
> - Classification, Regression: Breast Cancer Wisconsin (Prognostic) Data Set
> If time permits, also implement:
> - Resilient Backpropagation (RPROP)
> h2. III. Week Plan with list of deliverables
> * (Till May 23rd, community bonding period)
> Brainstorm with my mentor and the Apache Mahout community to come up with the 
> most optimal design for an extensible Neural Network framework. Code 
> prototypes to identify potential problems and/or investigate new solutions.
> Deliverable: A detailed report or design document on how to implement the 
> basic Neural Network framework and the learning algorithms.
> * (May 24th, coding starts) Week 1: 
> Deliverables: Basic Neural network classes (Neuron, Connection, Weight, 
> Layer, LearningRule, NeuralNetwork etc) and the various input, transfer and 
> error functions mentioned previously.
> * (May 31st) Week 2 and Week 3: 
> Deliverable: Driver, Mapper, Combiner and Reducer classes with basic 
> functonality to run a feedforward Neural Network on Hadoop (no training 
> methods yet, weights are generated using Nguyen-Widrow algorithm). 
> * (June 14th) Week 4: 
> Deliverable: Backpropagation algorithm using standard Batch Gradient descent.
>     
> * (June 21st) Week 5: 
> Deliverables: Variable learning rate and momentum during Batch Gradient 
> descent. Validation tests support. Do some big tests.
> * (June 28th) Week 6: 
> Deliverable: Support for Early stopping and Regularization (weight decay) 
> during training.
> * (July 5th) Week 7 and Week 8: 
> Deliverable: Conjugate gradient method with Brent's line search algorithm. 
> * (July 19th) Week 9: 
> Deliverable: Write unit tests. Do bigger scale tests for both batch gradient 
> descent and conjugate gradient method.
> * (July 26th) Week 10 and Week 11:
> Deliverable: 2 examples of classification and regression on real-world 
> datasets from UCI Machine Learning Repository. More tests.
> * (August 9th, tentative 'pencils down' date) Week 12: 
> Deliverable: Wind up the work. Scrub code. Improved documentation, tutorials 
> (on the wiki) etc.
> * (August 16: Final evaluation)
> h2. IV. Additional Information
> I am a final year Computer Science student at NIT Allahabad (India) 
> graduating in May. For my final year project/thesis, I am working on Open 
> Domain Question Answering. I participated in GSoC last year for the Apertium 
> machine translation system 
> (http://google-opensource.blogspot.com/2009/11/apertium-projects-first-google-summer.html).
>  I am familiar with the three major opensource Neural Network frameworks in 
> Java, JOONE, Encog and Neuroph since I have used them in past projects on 
> fingerprint recognition and face recognition (during a summer course on image 
> and speech processing). My research interests are machine learning and 
> statistical natural language processing and I will be enrolling for a Ph.D. 
> next semester(i.e. next fall) in the same institute. 
> I have no specific time constraints throughout the GSoC period. I will devote 
> a minimum of 6 hours everyday to GSoC.
> Time offset: UTC+5:30 (IST)
> h2. V. References
> [1] Fast parallel off-line training of multilayer perceptrons, S McLoone, GW 
> Irwin - IEEE Transactions on Neural Networks, 1997
> [2] Optimization of the backpropagation algorithm for training multilayer 
> perceptrons, W. Schiffmann, M. Joost and R. Werner, 1994
> [3] Map-Reduce for Machine Learning on Multicore, Cheng T. Chu, Sang K. Kim, 
> Yi A. Lin, et al - in NIPS, 2006
> [4] Neural networks for pattern recognition, CM Bishop - 1995 [BOOK] 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to