Hi all,

I just submitted a GSoC proposal for implementing Neural Network with
backpropagation on Hadoop.

Jira issue: http://issues.apache.org/jira/browse/MAHOUT-364

I would appreciate your feedback and comments on the proposal and on
the working or implementation plan.

-------------------------------------------------------------------------------------------------------


I. Brief Description

A feedforward neural network (NN) reveals several degrees of
parallelism within it such as weight parallelism, node parallelism,
network parallelism, layer parallelism and training parallelism.
However network based parallelism requires fine-grained
synchronization and communication and thus is not suitable for
map/reduce based algorithms. On the other hand, training-set
parallelism is coarse grained. This can be easily exploited on Hadoop
which can split up the input among different mappers. Each of the
mappers will then propagate the 'InputSplit' through their own copy of
the complete neural network.
The backpropagation algorithm will operate in batch mode. This is
because updating a common set of parameters after each training
example creates a bottleneck for parallelization. The overall error
gradient vector calculation can be parallelized by calculating the
gradients from each training vector in the Mapper, combining them to
get partial batch gradients and then adding them in a reducer to get
the overall batch gradient.
In a similiar manner, error function evaluations during line searches
(for the conjugate gradient and quasi-Newton algorithms) can be
efficiently parallelized.
Lastly, to avoid local minima in its error function, we can take
advantage of training session parallelism to start multiple training
sessions in parallel with different initial weights (simulated
annealing).



II. Detailed Proposal

The most important step is to design the base neural network classes
in such a way that other NN architectures like Hopfield nets, Boltzman
machines, SOM etc can be easily implemented by deriving from these
base classes. For that I propose to implement a set of core classes
that correspond to basic neural network concepts like artificial
neuron, neuron layer, neuron connections, weight, transfer function,
input function, learning rule etc. This architecture is inspired from
that of the opensource Neuroph neural network framework
(http://imgur.com/gDIOe.jpg). This design of the base architecture
allows for great flexibility in deriving newer NNs and learning rules.
All that needs to be done is to derive from the NeuralNetwork class,
provide the method for network creation, create a new training method
by deriving from LearningRule, and then add that learning rule to the
network during creation. In addition, the API is very intuitive and
easy to understand (in comparision to other NN frameworks like Encog
and JOONE).


** The approach to parallelization:

In the Driver class:
- The input parameters are read and the NeuralNetwork with a specified
LearningRule (training algorithm) created.
Initial weight values are randomly generated and written to the
FileSystem. If number of training sessions (for simulated annealing)
is specified, multiple sets of initial weight values are generated.
- Training is started by calling the NeuralNetwork's learn() method.
For each iteration, every time the error gradient vector needs to be
calculated, the method submits a Job where the input path to the
training-set vectors and various key properties (like path to the
stored weight values) are set. The gradient vectors calculated by the
Reducers are written back to an output path in the FileSystem.
After the JobClient.runJob() returns, the gradient vectors are
retrieved from the FileSystem and tested to see if the stopping
criterion is satisfied. The weights are then updated, using the method
implemented by the particular LearningRule. For line searches, each
error function evaluation is again done by submitting a job.
- The NN is trained in iterations until it converges.

In the Mapper class:
- Each Mapper is initialized using the configure method, the weights
are retrieved and the complete NeuralNetwork created.
- The map function then takes in the training vectors as key/value
pairs (the key is ignored), runs them through the NN to calculate the
outputs and backpropagates the errors to find out the error gradients.
The error gradient vectors are then output as key/value pairs where
all the keys are set to a common value, such as the training session
number (for each training session, all keys in the outputs of all the
mappers have to be identical).

In the Combiner class:
- Iterates through the all individual error gradient vectors output by
the mapper (since they all have the same key) and adds them up to get
a partial batch gradient.
In the Reducer class:
- There's a single reducer class that will combine all the partial
gradients from the Mappers to get the overall batch gradient.
- The final error gradient vector is written back to the FileSystem


** I propose to complete all of the following sub-tasks during GSoC 2010:

Implementation of the Backpropagation algorithm:
- Initialization of weights: using the Nguyen-Widrow algorithm to
select the initial range of starting weight values.
- Input, transfer and error functions: implement basic input functions
like WeightedSum and transfer functions like Sigmoid, Gaussian, tanh
and linear. Implement the sum-of-squares error function.
- Optimization methods to update the weights: (a) Batch Gradient
descent, with momentum and a variable learning rate method [2] (b) A
Conjugate gradient method with Brent's line search.

Improving generalization:
- Validating the network to test for overfitting (Early stopping method)
- Regularization (weight decay method)

Create examples for:
- Classification: using the Abalone Data Set from UCI Machine Learning
Repository
- Classification, Regression: Breast Cancer Wisconsin (Prognostic) Data Set

If time permits, also implement:
- Resilient Backpropagation (RPROP)


III. Week Plan with list of deliverables


(Till May 23rd, community bonding period)
Brainstorm with my mentor and the Apache Mahout community to come up
with the most optimal design for an extensible Neural Network
framework. Code prototypes to identify potential problems and/or
investigate new solutions.
Deliverable: A detailed report or design document on how to implement
the basic Neural Network framework and the learning algorithms.

(May 24th, coding starts) Week 1:
Deliverables: Basic Neural network classes (Neuron, Connection,
Weight, Layer, LearningRule, NeuralNetwork etc) and the various input,
transfer and error functions mentioned previously.

(May 31st) Week 2 and Week 3:
Deliverable: Driver, Mapper, Combiner and Reducer classes with basic
functonality to run a feedforward Neural Network on Hadoop (no
training methods yet, weights are generated using Nguyen-Widrow
algorithm).

(June 14th) Week 4:
Deliverable: Backpropagation algorithm using standard Batch Gradient descent.

(June 21st) Week 5:
Deliverables: Variable learning rate and momentum during Batch
Gradient descent. Validation tests support. Do some big tests.

(June 28th) Week 6:
Deliverable: Support for Early stopping and Regularization (weight
decay) during training.

(July 5th) Week 7 and Week 8:
Deliverable: Conjugate gradient method with Brent's line search algorithm.

(July 19th) Week 9:
Deliverable: Write unit tests. Do bigger scale tests for both batch
gradient descent and conjugate gradient method.

(July 26th) Week 10 and Week 11:
Deliverable: 2 examples of classification and regression on real-world
datasets from UCI Machine Learning Repository. More tests.

(August 9th, tentative 'pencils down' date) Week 12:
Deliverable: Wind up the work. Scrub code. Improved documentation,
tutorials (on the wiki) etc.

(August 16: Final evaluation)


IV. Additional Information

I am a final year Computer Science student at NIT Allahabad (India)
graduating in May. For my final year project/thesis, I am working on
Open Domain Question Answering. I participated in GSoC last year for
the Apertium machine translation system
(http://google-opensource.blogspot.com/2009/11/apertium-projects-first-google-summer.html).
I am familiar with the three major opensource Neural Network
frameworks in Java, JOONE, Encog and Neuroph since I have used them in
past projects on fingerprint recognition and face recognition (during
a summer course on image and speech processing). My research interests
are machine learning and statistical natural language processing and I
will be enrolling for a Ph.D. next semester(i.e. next fall) in the
same institute.

I have no specific time constraints throughout the GSoC period. I will
devote a minimum of 6 hours everyday to GSoC.
Time offset: UTC+5:30 (IST)


V. References

[1] Fast parallel off-line training of multilayer perceptrons, S
McLoone, GW Irwin - IEEE Transactions on Neural Networks, 1997
[2] Optimization of the backpropagation algorithm for training
multilayer perceptrons, W. Schiffmann, M. Joost and R. Werner, 1994
[3] Map-Reduce for Machine Learning on Multicore, Cheng T. Chu, Sang
K. Kim, Yi A. Lin, et al - in NIPS, 2006
[4] Neural networks for pattern recognition, CM Bishop - 1995 [BOOK]

Reply via email to