[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

bgreeven Wed, 10 Dec 2014 05:56:48 -0800

Github user bgreeven commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1290#discussion_r21603916
  
    --- Diff: docs/mllib-ann.md ---
    @@ -0,0 +1,239 @@
    +---
    +layout: global
    +title: Artificial Neural Networks - MLlib
    +displayTitle: <a href="mllib-guide.html">MLlib</a> - Artificial Neural 
Networks
    +---
    +
    +# Introduction
    +
    +This document describes the MLlib's Artificial Neural Network (ANN) 
implementation.
    +
    +The implementation currently consist of the following files:
    +
    +* 'ArtificialNeuralNetwork.scala': implements the ANN
    +* 'ANNSuite': implements automated tests for the ANN and its gradient
    +* 'ANNDemo': a demo that approximates three functions and shows a 
graphical representation of
    +the result
    +
    +# Summary of usage
    +
    +The "ArtificialNeuralNetwork" object is used as an interface to the neural 
network. It is
    +called as follows:
    +
    +```
    +val annModel = ArtificialNeuralNetwork.train(rdd, hiddenLayersTopology, 
maxNumIterations)
    +```
    +
    +where
    +
    +* `rdd` is an RDD of type (Vector,Vector), the first element containing 
the input vector and
    +the second the associated output vector.
    +* `hiddenLayersTopology` is an array of integers (Array[Int]), which 
contains the number of
    +nodes per hidden layer, starting with the layer that takes inputs from the 
input layer, and
    +finishing with the layer that outputs to the output layer. The bias nodes 
are not counted.
    +* `maxNumIterations` is an upper bound to the number of iterations to be 
performed.
    +* `ANNmodel` contains the trained ANN parameters, and can be used to 
calculated the ANNs
    +approximation to arbitrary input values.
    +
    +The approximations can be calculated as follows:
    +
    +```
    +val v_out = annModel.predict(v_in)
    +```
    +
    +where v_in is either a Vector or an RDD of Vectors, and v_out respectively 
a Vector or RDD of
    +(Vector,Vector) pairs, corresponding to input and output values.
    +
    +Further details and other calling options will be elaborated upon below.
    +
    +# Architecture and Notation
    +
    +The file ArtificialNeuralNetwork.scala implements the ANN. The following 
picture shows the
    +architecture of a 3-layer ANN:
    +
    +```
    + +-------+
    + |       |
    + | N_0,0 |
    + |       | 
    + +-------+        +-------+
    +                  |       |
    + +-------+        | N_0,1 |       +-------+
    + |       |        |       |       |       |
    + | N_1,0 |-       +-------+     ->| N_0,2 |
    + |       | \ Wij1              /  |       |
    + +-------+  --    +-------+  --   +-------+
    +               \  |       | / Wjk2
    +     :          ->| N_1,1 |-      +-------+
    +     :            |       |       |       |
    +     :            +-------+       | N_1,2 |
    +     :                            |       |
    +     :                :           +-------+
    +     :                :
    +     :                :                :
    +     :                : 
    +     :                :           +-------+
    +     :                :           |       |
    +     :                :           |N_K-1,2|
    +     :                            |       |
    +     :            +-------+       +-------+
    +     :            |       |
    +     :            |N_J-1,1|
    +                  |       |
    + +-------+        +-------+
    + |       | 
    + |N_I-1,0|  
    + |       |
    + +-------+
    +
    + +-------+        +--------+
    + |       |        |        |
    + |   -1  |        |   -1   |
    + |       |        |        |
    + +-------+        +--------+
    +
    +INPUT LAYER      HIDDEN LAYER    OUTPUT LAYER
    +```
    +
    +The i-th node in layer l is denoted by N_{i,l}, both i and l starting with 
0. The weight
    +between node i in layer l-1 and node j in layer l is denoted by Wijl. 
Layer 0 is the input
    +layer, whereas layer L is the output layer.
    +
    +The ANN also implements bias units. These are nodes that always output the 
value -1. The bias
    +units are in all layers except the output layer. They act similar to other 
nodes, but do not
    +have input.
    +
    +The value of node N_{j,l} is calculated  as follows:
    +
    +`$N_{j,l} = g( \sum_{i=0}^{topology_l} W_{i,j,l)*N_{i,l-1} )$`
    +
    +Where g is the sigmoid function
    +
    +`$g(t) = \frac{e^{\beta t} }{1+e^{\beta t}}$`
    +
    +# LBFGS
    +
    +MLlib's ANN implementation uses the LBFGS optimisation algorithm for 
training. It minimises the
    +following error function:
    +
    +`$E = \sum_{k=0}^{K-1} (N_{k,L} - Y_k)^2$`
    +
    +where Y_k is the target output given inputs N_{0,0} ... N_{I-1,0}.
    +
    +# Implementation Details
    +
    +## The "ArtificialNeuralNetwork" class
    +
    +The "ArtificialNeuralNetwork" class has the following constructor:
    +
    +```
    +class ArtificialNeuralNetwork private(topology: Array[Int], 
maxNumIterations: Int,
    +convergenceTol: Double)
    +```
    +
    +* `topology` is an array of integers indicating then number of nodes per 
layer. For example, if
    +"topology" holds (3, 5, 1), it means that there are three input nodes, 
five nodes in a single
    +hidden layer and 1 output node.
    +* `maxNumIterations` indicates the number of iterations after which the 
LBFGS algorithm must
    +have stopped.
    +* `convergenceTol` indicates the acceptable error, and if reached the 
LBFGS algorithm will
    +stop. A lower value of "convergenceTol" will give a higher precision.
    +
    +## The "ArtificialNeuralNetwork" object
    +
    +The object "ArtificialNeuralNetwork" is the interface to the 
"ArtificialNeuralNetwork" class.
    +The object contains the training function. There are six different 
instances of the training
    +function, each for use with different parameters. All take as the first 
parameter the RDD
    +"input", which contains pairs of input and output vectors.
    +
    +In addition, there are three functions for generating random weights. Two 
take a fixed seed,
    +which is useful for testing if one wants to start with the same weights in 
every test.
    +
    +* `def train(trainingRDD: RDD[(Vector, Vector)], hiddenLayersTopology: 
Array[Int], maxNumIterations:
    +Int): ArtificialNeuralNetworkModel`: starts training with random initial 
weights, and a default
    +convergenceTol=1e-4.
    +* `def train(trainingRDD: RDD[(Vector, Vector)], model: 
ArtificialNeuralNetworkModel,
    +maxNumIterations: Int): ArtificialNeuralNetworkModel`: resumes training 
given an earlier
    +calculated model, and a default convergenceTol=1e-4.
    +* `def train(trainingRDD: RDD[(Vector,Vector)], hiddenLayersTopology: 
Array[Int],
    +initialWeights: Vector, maxNumIterations: Int): 
ArtificialNeuralNetworkModel`: Trains an ANN
    +with given initial weights, and a default convergenceTol=1e-4.
    +* `def train(trainingRDD: RDD[(Vector, Vector)], hiddenLayersTopology: 
Array[Int], maxNumIterations:
    +Int, convergenceTol: Double): ArtificialNeuralNetworkModel`: starts 
training with random
    +initial weights. Allows setting a customised "convergenceTol".
    +* `def train(trainingRDD: RDD[(Vector, Vector)], model: 
ArtificialNeuralNetworkModel,
    +maxNumIterations: Int, convergenceTol: Double): 
ArtificialNeuralNetworkModel`: resumes training
    +given an earlier calculated model. Allows setting a customised 
"convergenceTol".
    +* `def train(trainingRDD: RDD[(Vector,Vector)], hiddenLayersTopology: 
Array[Int],
    +initialWeights: Vector, maxNumIterations: Int, convergenceTol: Double): 
    +ArtificialNeuralNetworkModel`: Trains an ANN with given initial weights. 
Allows setting a
    +customised "convergenceTol".
    +* `def randomWeights(trainingRDD: RDD[(Vector,Vector)], 
hiddenLayersTopology: Array[Int]):
    +Vector`: Generates a random weights vector.
    +*`def randomWeights(trainingRDD: RDD[(Vector,Vector)], 
hiddenLayersTopology: Array[Int],
    +seed: Int): Vector`: Generates a random weights vector with given seed.
    +*`def randomWeights(inputLayerSize: Int, outputLayerSize: Int, 
hiddenLayersTopology: Array[Int],
    +seed: Int): Vector`: Generates a random weights vector, using given random 
seed, input layer
    +size, hidden layers topology and output layer size.
    +
    +Notice that the "hiddenLayersTopology" differs from the "topology" array. 
The
    +"hiddenLayersTopology" does not include the number of nodes in the input 
and output layers. The
    +number of nodes in input and output layers is calculated from the first 
element of the training
    +RDD. For example, the "topology" array (3, 5, 7, 1) would have a 
"hiddenLayersTopology" (5, 7),
    +the values 3 and 1 are deduced from the training data. The rationale for 
having these different
    +arrays is that future methods may have a different mapping between input 
values and input nodes
    +or output values and output nodes.
    +
    +## The "ArtificialNeuralNetworkModel" class
    +
    +All training functions return the trained ANN using the class 
"ArtificialNeuralNetworkModel".
    +This class has the following function:
    +
    +* `predict(testData: Vector): Vector` calculates the output vector given 
input vector
    +"testData".
    +* `predict(testData: RDD[Vector]): RDD[(Vector,Vector)]` returns (input, 
output) vector pairs,
    +using input vector pairs in "testData".
    +
    +The weights used by "predict" come from the model.
    +
    +## Training
    +
    +We have chosen to implement the ANN with LBFGS as optimiser function. We 
compared it with
    +Statistical Gradient Descent. LBGFS was much faster, but in accordance is 
also earlier with
    --- End diff --
    
    Stochastic of course. Changed it. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

Reply via email to