[jira] [Updated] (MAHOUT-1265) Add Multilayer Perceptron

2013-12-19 Thread Yexi Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yexi Jiang updated MAHOUT-1265:
---

Attachment: Mahout-1265-17.patch

The version 17.

 Add Multilayer Perceptron 
 --

 Key: MAHOUT-1265
 URL: https://issues.apache.org/jira/browse/MAHOUT-1265
 Project: Mahout
  Issue Type: New Feature
Reporter: Yexi Jiang
  Labels: machine_learning, neural_network
 Attachments: MAHOUT-1265.patch, Mahout-1265-13.patch, 
 Mahout-1265-17.patch


 Design of multilayer perceptron
 1. Motivation
 A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
 network, which is a mathematical model inspired by the biological neural 
 network. The multilayer perceptron can be used for various machine learning 
 tasks such as classification and regression. It is helpful if it can be 
 included in mahout.
 2. API
 The design goal of API is to facilitate the usage of MLP for user, and make 
 the implementation detail user transparent.
 The following is an example code of how user uses the MLP.
 -
 //  set the parameters
 double learningRate = 0.5;
 double momentum = 0.1;
 int[] layerSizeArray = new int[] {2, 5, 1};
 String costFuncName = “SquaredError”;
 String squashingFuncName = “Sigmoid”;
 //  the location to store the model, if there is already an existing model at 
 the specified location, MLP will throw exception
 URI modelLocation = ...
 MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
 modelLocation);
 mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
 //  the user can also load an existing model with given URI and update the 
 model with new training data, if there is no existing model at the specified 
 location, an exception will be thrown
 /*
 MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
 regularization, momentum, squashingFuncName, costFuncName, modelLocation);
 */
 URI trainingDataLocation = …
 //  the detail of training is transparent to the user, it may running in a 
 single machine or in a distributed environment
 mlp.train(trainingDataLocation);
 //  user can also train the model with one training instance in stochastic 
 gradient descent way
 Vector trainingInstance = ...
 mlp.train(trainingInstance);
 //  prepare the input feature
 Vector inputFeature …
 //  the semantic meaning of the output result is defined by the user
 //  in general case, the dimension of output vector is 1 for regression and 
 two-class classification
 //  the dimension of output vector is n for n-class classification (n  2)
 Vector outputVector = mlp.output(inputFeature); 
 -
 3. Methodology
 The output calculation can be easily implemented with feed-forward approach. 
 Also, the single machine training is straightforward. The following will 
 describe how to train MLP in distributed way with batch gradient descent. The 
 workflow is illustrated as the below figure.
 https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960h=720
 For the distributed training, each training iteration is divided into two 
 steps, the weight update calculation step and the weight update step. The 
 distributed MLP can only be trained in batch-update approach.
 3.1 The partial weight update calculation step:
 This step trains the MLP distributedly. Each task will get a copy of the MLP 
 model, and calculate the weight update with a partition of data.
 Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
 D denotes the training set, d denotes a training instance, t_d denotes the 
 class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
 function is used as the squashing function, 
 squared error is used as the cost function, 
 t_i denotes the target value for the ith dimension of the output layer, 
 o_i denotes the actual output for the ith dimension of the output layer, 
 l denotes the learning rate,
 w_{ij} denotes the weight between the jth neuron in previous layer and the 
 ith neuron in the next layer. 
 The weight of each edge is updated as 
 \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
 where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
 o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
 o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
 It is easy to know that \delta_j can be rewritten as 
 \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
 * (t_j^{(m_i)} - o_j^{(m_i)})
 The above equation indicates that the \delta_j can be divided into k parts.
 So for the implementation, each mapper can calculate part of \delta_j with 
 given partition of data, and then store the result into a 

[jira] [Updated] (MAHOUT-1265) Add Multilayer Perceptron

2013-12-19 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1265:
--

Attachment: (was: Mahout-1265-13.patch)

 Add Multilayer Perceptron 
 --

 Key: MAHOUT-1265
 URL: https://issues.apache.org/jira/browse/MAHOUT-1265
 Project: Mahout
  Issue Type: New Feature
Reporter: Yexi Jiang
  Labels: machine_learning, neural_network
 Attachments: MAHOUT-1265.patch, Mahout-1265-17.patch


 Design of multilayer perceptron
 1. Motivation
 A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
 network, which is a mathematical model inspired by the biological neural 
 network. The multilayer perceptron can be used for various machine learning 
 tasks such as classification and regression. It is helpful if it can be 
 included in mahout.
 2. API
 The design goal of API is to facilitate the usage of MLP for user, and make 
 the implementation detail user transparent.
 The following is an example code of how user uses the MLP.
 -
 //  set the parameters
 double learningRate = 0.5;
 double momentum = 0.1;
 int[] layerSizeArray = new int[] {2, 5, 1};
 String costFuncName = “SquaredError”;
 String squashingFuncName = “Sigmoid”;
 //  the location to store the model, if there is already an existing model at 
 the specified location, MLP will throw exception
 URI modelLocation = ...
 MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
 modelLocation);
 mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
 //  the user can also load an existing model with given URI and update the 
 model with new training data, if there is no existing model at the specified 
 location, an exception will be thrown
 /*
 MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
 regularization, momentum, squashingFuncName, costFuncName, modelLocation);
 */
 URI trainingDataLocation = …
 //  the detail of training is transparent to the user, it may running in a 
 single machine or in a distributed environment
 mlp.train(trainingDataLocation);
 //  user can also train the model with one training instance in stochastic 
 gradient descent way
 Vector trainingInstance = ...
 mlp.train(trainingInstance);
 //  prepare the input feature
 Vector inputFeature …
 //  the semantic meaning of the output result is defined by the user
 //  in general case, the dimension of output vector is 1 for regression and 
 two-class classification
 //  the dimension of output vector is n for n-class classification (n  2)
 Vector outputVector = mlp.output(inputFeature); 
 -
 3. Methodology
 The output calculation can be easily implemented with feed-forward approach. 
 Also, the single machine training is straightforward. The following will 
 describe how to train MLP in distributed way with batch gradient descent. The 
 workflow is illustrated as the below figure.
 https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960h=720
 For the distributed training, each training iteration is divided into two 
 steps, the weight update calculation step and the weight update step. The 
 distributed MLP can only be trained in batch-update approach.
 3.1 The partial weight update calculation step:
 This step trains the MLP distributedly. Each task will get a copy of the MLP 
 model, and calculate the weight update with a partition of data.
 Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
 D denotes the training set, d denotes a training instance, t_d denotes the 
 class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
 function is used as the squashing function, 
 squared error is used as the cost function, 
 t_i denotes the target value for the ith dimension of the output layer, 
 o_i denotes the actual output for the ith dimension of the output layer, 
 l denotes the learning rate,
 w_{ij} denotes the weight between the jth neuron in previous layer and the 
 ith neuron in the next layer. 
 The weight of each edge is updated as 
 \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
 where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
 o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
 o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
 It is easy to know that \delta_j can be rewritten as 
 \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
 * (t_j^{(m_i)} - o_j^{(m_i)})
 The above equation indicates that the \delta_j can be divided into k parts.
 So for the implementation, each mapper can calculate part of \delta_j with 
 given partition of data, and then store the result into a specified location.
 3.2 

[jira] [Updated] (MAHOUT-1265) Add Multilayer Perceptron

2013-12-19 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1265:
--

   Resolution: Fixed
Fix Version/s: 0.9
 Assignee: Suneel Marthi
   Status: Resolved  (was: Patch Available)

 Add Multilayer Perceptron 
 --

 Key: MAHOUT-1265
 URL: https://issues.apache.org/jira/browse/MAHOUT-1265
 Project: Mahout
  Issue Type: New Feature
Reporter: Yexi Jiang
Assignee: Suneel Marthi
  Labels: machine_learning, neural_network
 Fix For: 0.9

 Attachments: MAHOUT-1265.patch, Mahout-1265-17.patch


 Design of multilayer perceptron
 1. Motivation
 A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
 network, which is a mathematical model inspired by the biological neural 
 network. The multilayer perceptron can be used for various machine learning 
 tasks such as classification and regression. It is helpful if it can be 
 included in mahout.
 2. API
 The design goal of API is to facilitate the usage of MLP for user, and make 
 the implementation detail user transparent.
 The following is an example code of how user uses the MLP.
 -
 //  set the parameters
 double learningRate = 0.5;
 double momentum = 0.1;
 int[] layerSizeArray = new int[] {2, 5, 1};
 String costFuncName = “SquaredError”;
 String squashingFuncName = “Sigmoid”;
 //  the location to store the model, if there is already an existing model at 
 the specified location, MLP will throw exception
 URI modelLocation = ...
 MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
 modelLocation);
 mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
 //  the user can also load an existing model with given URI and update the 
 model with new training data, if there is no existing model at the specified 
 location, an exception will be thrown
 /*
 MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
 regularization, momentum, squashingFuncName, costFuncName, modelLocation);
 */
 URI trainingDataLocation = …
 //  the detail of training is transparent to the user, it may running in a 
 single machine or in a distributed environment
 mlp.train(trainingDataLocation);
 //  user can also train the model with one training instance in stochastic 
 gradient descent way
 Vector trainingInstance = ...
 mlp.train(trainingInstance);
 //  prepare the input feature
 Vector inputFeature …
 //  the semantic meaning of the output result is defined by the user
 //  in general case, the dimension of output vector is 1 for regression and 
 two-class classification
 //  the dimension of output vector is n for n-class classification (n  2)
 Vector outputVector = mlp.output(inputFeature); 
 -
 3. Methodology
 The output calculation can be easily implemented with feed-forward approach. 
 Also, the single machine training is straightforward. The following will 
 describe how to train MLP in distributed way with batch gradient descent. The 
 workflow is illustrated as the below figure.
 https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960h=720
 For the distributed training, each training iteration is divided into two 
 steps, the weight update calculation step and the weight update step. The 
 distributed MLP can only be trained in batch-update approach.
 3.1 The partial weight update calculation step:
 This step trains the MLP distributedly. Each task will get a copy of the MLP 
 model, and calculate the weight update with a partition of data.
 Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
 D denotes the training set, d denotes a training instance, t_d denotes the 
 class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
 function is used as the squashing function, 
 squared error is used as the cost function, 
 t_i denotes the target value for the ith dimension of the output layer, 
 o_i denotes the actual output for the ith dimension of the output layer, 
 l denotes the learning rate,
 w_{ij} denotes the weight between the jth neuron in previous layer and the 
 ith neuron in the next layer. 
 The weight of each edge is updated as 
 \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
 where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
 o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
 o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
 It is easy to know that \delta_j can be rewritten as 
 \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
 * (t_j^{(m_i)} - o_j^{(m_i)})
 The above equation indicates that the \delta_j can be divided into k parts.
 So for the 

[jira] [Updated] (MAHOUT-1265) Add Multilayer Perceptron

2013-12-17 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1265:
--

Attachment: (was: Mahout-1265-6.patch)

 Add Multilayer Perceptron 
 --

 Key: MAHOUT-1265
 URL: https://issues.apache.org/jira/browse/MAHOUT-1265
 Project: Mahout
  Issue Type: New Feature
Reporter: Yexi Jiang
  Labels: machine_learning, neural_network
 Attachments: MAHOUT-1265.patch, Mahout-1265-13.patch


 Design of multilayer perceptron
 1. Motivation
 A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
 network, which is a mathematical model inspired by the biological neural 
 network. The multilayer perceptron can be used for various machine learning 
 tasks such as classification and regression. It is helpful if it can be 
 included in mahout.
 2. API
 The design goal of API is to facilitate the usage of MLP for user, and make 
 the implementation detail user transparent.
 The following is an example code of how user uses the MLP.
 -
 //  set the parameters
 double learningRate = 0.5;
 double momentum = 0.1;
 int[] layerSizeArray = new int[] {2, 5, 1};
 String costFuncName = “SquaredError”;
 String squashingFuncName = “Sigmoid”;
 //  the location to store the model, if there is already an existing model at 
 the specified location, MLP will throw exception
 URI modelLocation = ...
 MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
 modelLocation);
 mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
 //  the user can also load an existing model with given URI and update the 
 model with new training data, if there is no existing model at the specified 
 location, an exception will be thrown
 /*
 MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
 regularization, momentum, squashingFuncName, costFuncName, modelLocation);
 */
 URI trainingDataLocation = …
 //  the detail of training is transparent to the user, it may running in a 
 single machine or in a distributed environment
 mlp.train(trainingDataLocation);
 //  user can also train the model with one training instance in stochastic 
 gradient descent way
 Vector trainingInstance = ...
 mlp.train(trainingInstance);
 //  prepare the input feature
 Vector inputFeature …
 //  the semantic meaning of the output result is defined by the user
 //  in general case, the dimension of output vector is 1 for regression and 
 two-class classification
 //  the dimension of output vector is n for n-class classification (n  2)
 Vector outputVector = mlp.output(inputFeature); 
 -
 3. Methodology
 The output calculation can be easily implemented with feed-forward approach. 
 Also, the single machine training is straightforward. The following will 
 describe how to train MLP in distributed way with batch gradient descent. The 
 workflow is illustrated as the below figure.
 https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960h=720
 For the distributed training, each training iteration is divided into two 
 steps, the weight update calculation step and the weight update step. The 
 distributed MLP can only be trained in batch-update approach.
 3.1 The partial weight update calculation step:
 This step trains the MLP distributedly. Each task will get a copy of the MLP 
 model, and calculate the weight update with a partition of data.
 Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
 D denotes the training set, d denotes a training instance, t_d denotes the 
 class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
 function is used as the squashing function, 
 squared error is used as the cost function, 
 t_i denotes the target value for the ith dimension of the output layer, 
 o_i denotes the actual output for the ith dimension of the output layer, 
 l denotes the learning rate,
 w_{ij} denotes the weight between the jth neuron in previous layer and the 
 ith neuron in the next layer. 
 The weight of each edge is updated as 
 \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
 where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
 o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
 o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
 It is easy to know that \delta_j can be rewritten as 
 \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
 * (t_j^{(m_i)} - o_j^{(m_i)})
 The above equation indicates that the \delta_j can be divided into k parts.
 So for the implementation, each mapper can calculate part of \delta_j with 
 given partition of data, and then store the result into a specified location.
 3.2 

[jira] [Updated] (MAHOUT-1265) Add Multilayer Perceptron

2013-12-17 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1265:
--

Attachment: (was: Mahout-1265-11.patch)

 Add Multilayer Perceptron 
 --

 Key: MAHOUT-1265
 URL: https://issues.apache.org/jira/browse/MAHOUT-1265
 Project: Mahout
  Issue Type: New Feature
Reporter: Yexi Jiang
  Labels: machine_learning, neural_network
 Attachments: MAHOUT-1265.patch, Mahout-1265-13.patch


 Design of multilayer perceptron
 1. Motivation
 A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
 network, which is a mathematical model inspired by the biological neural 
 network. The multilayer perceptron can be used for various machine learning 
 tasks such as classification and regression. It is helpful if it can be 
 included in mahout.
 2. API
 The design goal of API is to facilitate the usage of MLP for user, and make 
 the implementation detail user transparent.
 The following is an example code of how user uses the MLP.
 -
 //  set the parameters
 double learningRate = 0.5;
 double momentum = 0.1;
 int[] layerSizeArray = new int[] {2, 5, 1};
 String costFuncName = “SquaredError”;
 String squashingFuncName = “Sigmoid”;
 //  the location to store the model, if there is already an existing model at 
 the specified location, MLP will throw exception
 URI modelLocation = ...
 MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
 modelLocation);
 mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
 //  the user can also load an existing model with given URI and update the 
 model with new training data, if there is no existing model at the specified 
 location, an exception will be thrown
 /*
 MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
 regularization, momentum, squashingFuncName, costFuncName, modelLocation);
 */
 URI trainingDataLocation = …
 //  the detail of training is transparent to the user, it may running in a 
 single machine or in a distributed environment
 mlp.train(trainingDataLocation);
 //  user can also train the model with one training instance in stochastic 
 gradient descent way
 Vector trainingInstance = ...
 mlp.train(trainingInstance);
 //  prepare the input feature
 Vector inputFeature …
 //  the semantic meaning of the output result is defined by the user
 //  in general case, the dimension of output vector is 1 for regression and 
 two-class classification
 //  the dimension of output vector is n for n-class classification (n  2)
 Vector outputVector = mlp.output(inputFeature); 
 -
 3. Methodology
 The output calculation can be easily implemented with feed-forward approach. 
 Also, the single machine training is straightforward. The following will 
 describe how to train MLP in distributed way with batch gradient descent. The 
 workflow is illustrated as the below figure.
 https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960h=720
 For the distributed training, each training iteration is divided into two 
 steps, the weight update calculation step and the weight update step. The 
 distributed MLP can only be trained in batch-update approach.
 3.1 The partial weight update calculation step:
 This step trains the MLP distributedly. Each task will get a copy of the MLP 
 model, and calculate the weight update with a partition of data.
 Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
 D denotes the training set, d denotes a training instance, t_d denotes the 
 class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
 function is used as the squashing function, 
 squared error is used as the cost function, 
 t_i denotes the target value for the ith dimension of the output layer, 
 o_i denotes the actual output for the ith dimension of the output layer, 
 l denotes the learning rate,
 w_{ij} denotes the weight between the jth neuron in previous layer and the 
 ith neuron in the next layer. 
 The weight of each edge is updated as 
 \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
 where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
 o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
 o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
 It is easy to know that \delta_j can be rewritten as 
 \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
 * (t_j^{(m_i)} - o_j^{(m_i)})
 The above equation indicates that the \delta_j can be divided into k parts.
 So for the implementation, each mapper can calculate part of \delta_j with 
 given partition of data, and then store the result into a specified location.
 3.2 

[jira] [Updated] (MAHOUT-1265) Add Multilayer Perceptron

2013-12-17 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1265:
--

Attachment: (was: MAHOUT-1265.patch)

 Add Multilayer Perceptron 
 --

 Key: MAHOUT-1265
 URL: https://issues.apache.org/jira/browse/MAHOUT-1265
 Project: Mahout
  Issue Type: New Feature
Reporter: Yexi Jiang
  Labels: machine_learning, neural_network
 Attachments: MAHOUT-1265.patch, Mahout-1265-13.patch


 Design of multilayer perceptron
 1. Motivation
 A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
 network, which is a mathematical model inspired by the biological neural 
 network. The multilayer perceptron can be used for various machine learning 
 tasks such as classification and regression. It is helpful if it can be 
 included in mahout.
 2. API
 The design goal of API is to facilitate the usage of MLP for user, and make 
 the implementation detail user transparent.
 The following is an example code of how user uses the MLP.
 -
 //  set the parameters
 double learningRate = 0.5;
 double momentum = 0.1;
 int[] layerSizeArray = new int[] {2, 5, 1};
 String costFuncName = “SquaredError”;
 String squashingFuncName = “Sigmoid”;
 //  the location to store the model, if there is already an existing model at 
 the specified location, MLP will throw exception
 URI modelLocation = ...
 MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
 modelLocation);
 mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
 //  the user can also load an existing model with given URI and update the 
 model with new training data, if there is no existing model at the specified 
 location, an exception will be thrown
 /*
 MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
 regularization, momentum, squashingFuncName, costFuncName, modelLocation);
 */
 URI trainingDataLocation = …
 //  the detail of training is transparent to the user, it may running in a 
 single machine or in a distributed environment
 mlp.train(trainingDataLocation);
 //  user can also train the model with one training instance in stochastic 
 gradient descent way
 Vector trainingInstance = ...
 mlp.train(trainingInstance);
 //  prepare the input feature
 Vector inputFeature …
 //  the semantic meaning of the output result is defined by the user
 //  in general case, the dimension of output vector is 1 for regression and 
 two-class classification
 //  the dimension of output vector is n for n-class classification (n  2)
 Vector outputVector = mlp.output(inputFeature); 
 -
 3. Methodology
 The output calculation can be easily implemented with feed-forward approach. 
 Also, the single machine training is straightforward. The following will 
 describe how to train MLP in distributed way with batch gradient descent. The 
 workflow is illustrated as the below figure.
 https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960h=720
 For the distributed training, each training iteration is divided into two 
 steps, the weight update calculation step and the weight update step. The 
 distributed MLP can only be trained in batch-update approach.
 3.1 The partial weight update calculation step:
 This step trains the MLP distributedly. Each task will get a copy of the MLP 
 model, and calculate the weight update with a partition of data.
 Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
 D denotes the training set, d denotes a training instance, t_d denotes the 
 class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
 function is used as the squashing function, 
 squared error is used as the cost function, 
 t_i denotes the target value for the ith dimension of the output layer, 
 o_i denotes the actual output for the ith dimension of the output layer, 
 l denotes the learning rate,
 w_{ij} denotes the weight between the jth neuron in previous layer and the 
 ith neuron in the next layer. 
 The weight of each edge is updated as 
 \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
 where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
 o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
 o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
 It is easy to know that \delta_j can be rewritten as 
 \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
 * (t_j^{(m_i)} - o_j^{(m_i)})
 The above equation indicates that the \delta_j can be divided into k parts.
 So for the implementation, each mapper can calculate part of \delta_j with 
 given partition of data, and then store the result into a specified location.
 3.2 The 

[jira] [Updated] (MAHOUT-1265) Add Multilayer Perceptron

2013-12-17 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1265:
--

Attachment: MAHOUT-1265.patch

Updated patch, fixed styling issues.

 Add Multilayer Perceptron 
 --

 Key: MAHOUT-1265
 URL: https://issues.apache.org/jira/browse/MAHOUT-1265
 Project: Mahout
  Issue Type: New Feature
Reporter: Yexi Jiang
  Labels: machine_learning, neural_network
 Attachments: MAHOUT-1265.patch, Mahout-1265-13.patch


 Design of multilayer perceptron
 1. Motivation
 A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
 network, which is a mathematical model inspired by the biological neural 
 network. The multilayer perceptron can be used for various machine learning 
 tasks such as classification and regression. It is helpful if it can be 
 included in mahout.
 2. API
 The design goal of API is to facilitate the usage of MLP for user, and make 
 the implementation detail user transparent.
 The following is an example code of how user uses the MLP.
 -
 //  set the parameters
 double learningRate = 0.5;
 double momentum = 0.1;
 int[] layerSizeArray = new int[] {2, 5, 1};
 String costFuncName = “SquaredError”;
 String squashingFuncName = “Sigmoid”;
 //  the location to store the model, if there is already an existing model at 
 the specified location, MLP will throw exception
 URI modelLocation = ...
 MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
 modelLocation);
 mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
 //  the user can also load an existing model with given URI and update the 
 model with new training data, if there is no existing model at the specified 
 location, an exception will be thrown
 /*
 MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
 regularization, momentum, squashingFuncName, costFuncName, modelLocation);
 */
 URI trainingDataLocation = …
 //  the detail of training is transparent to the user, it may running in a 
 single machine or in a distributed environment
 mlp.train(trainingDataLocation);
 //  user can also train the model with one training instance in stochastic 
 gradient descent way
 Vector trainingInstance = ...
 mlp.train(trainingInstance);
 //  prepare the input feature
 Vector inputFeature …
 //  the semantic meaning of the output result is defined by the user
 //  in general case, the dimension of output vector is 1 for regression and 
 two-class classification
 //  the dimension of output vector is n for n-class classification (n  2)
 Vector outputVector = mlp.output(inputFeature); 
 -
 3. Methodology
 The output calculation can be easily implemented with feed-forward approach. 
 Also, the single machine training is straightforward. The following will 
 describe how to train MLP in distributed way with batch gradient descent. The 
 workflow is illustrated as the below figure.
 https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960h=720
 For the distributed training, each training iteration is divided into two 
 steps, the weight update calculation step and the weight update step. The 
 distributed MLP can only be trained in batch-update approach.
 3.1 The partial weight update calculation step:
 This step trains the MLP distributedly. Each task will get a copy of the MLP 
 model, and calculate the weight update with a partition of data.
 Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
 D denotes the training set, d denotes a training instance, t_d denotes the 
 class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
 function is used as the squashing function, 
 squared error is used as the cost function, 
 t_i denotes the target value for the ith dimension of the output layer, 
 o_i denotes the actual output for the ith dimension of the output layer, 
 l denotes the learning rate,
 w_{ij} denotes the weight between the jth neuron in previous layer and the 
 ith neuron in the next layer. 
 The weight of each edge is updated as 
 \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
 where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
 o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
 o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
 It is easy to know that \delta_j can be rewritten as 
 \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
 * (t_j^{(m_i)} - o_j^{(m_i)})
 The above equation indicates that the \delta_j can be divided into k parts.
 So for the implementation, each mapper can calculate part of \delta_j with 
 given partition of data, and then store the result into a 

[jira] [Updated] (MAHOUT-1265) Add Multilayer Perceptron

2013-12-09 Thread Yexi Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yexi Jiang updated MAHOUT-1265:
---

Attachment: Mahout-1265-11.patch

This is the final version of the patch. It has been reviewed by [~smarthi].

 Add Multilayer Perceptron 
 --

 Key: MAHOUT-1265
 URL: https://issues.apache.org/jira/browse/MAHOUT-1265
 Project: Mahout
  Issue Type: New Feature
Reporter: Yexi Jiang
  Labels: machine_learning, neural_network
 Attachments: Mahout-1265-11.patch, Mahout-1265-6.patch, 
 mahout-1265.patch


 Design of multilayer perceptron
 1. Motivation
 A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
 network, which is a mathematical model inspired by the biological neural 
 network. The multilayer perceptron can be used for various machine learning 
 tasks such as classification and regression. It is helpful if it can be 
 included in mahout.
 2. API
 The design goal of API is to facilitate the usage of MLP for user, and make 
 the implementation detail user transparent.
 The following is an example code of how user uses the MLP.
 -
 //  set the parameters
 double learningRate = 0.5;
 double momentum = 0.1;
 int[] layerSizeArray = new int[] {2, 5, 1};
 String costFuncName = “SquaredError”;
 String squashingFuncName = “Sigmoid”;
 //  the location to store the model, if there is already an existing model at 
 the specified location, MLP will throw exception
 URI modelLocation = ...
 MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
 modelLocation);
 mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
 //  the user can also load an existing model with given URI and update the 
 model with new training data, if there is no existing model at the specified 
 location, an exception will be thrown
 /*
 MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
 regularization, momentum, squashingFuncName, costFuncName, modelLocation);
 */
 URI trainingDataLocation = …
 //  the detail of training is transparent to the user, it may running in a 
 single machine or in a distributed environment
 mlp.train(trainingDataLocation);
 //  user can also train the model with one training instance in stochastic 
 gradient descent way
 Vector trainingInstance = ...
 mlp.train(trainingInstance);
 //  prepare the input feature
 Vector inputFeature …
 //  the semantic meaning of the output result is defined by the user
 //  in general case, the dimension of output vector is 1 for regression and 
 two-class classification
 //  the dimension of output vector is n for n-class classification (n  2)
 Vector outputVector = mlp.output(inputFeature); 
 -
 3. Methodology
 The output calculation can be easily implemented with feed-forward approach. 
 Also, the single machine training is straightforward. The following will 
 describe how to train MLP in distributed way with batch gradient descent. The 
 workflow is illustrated as the below figure.
 https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960h=720
 For the distributed training, each training iteration is divided into two 
 steps, the weight update calculation step and the weight update step. The 
 distributed MLP can only be trained in batch-update approach.
 3.1 The partial weight update calculation step:
 This step trains the MLP distributedly. Each task will get a copy of the MLP 
 model, and calculate the weight update with a partition of data.
 Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
 D denotes the training set, d denotes a training instance, t_d denotes the 
 class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
 function is used as the squashing function, 
 squared error is used as the cost function, 
 t_i denotes the target value for the ith dimension of the output layer, 
 o_i denotes the actual output for the ith dimension of the output layer, 
 l denotes the learning rate,
 w_{ij} denotes the weight between the jth neuron in previous layer and the 
 ith neuron in the next layer. 
 The weight of each edge is updated as 
 \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
 where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
 o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
 o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
 It is easy to know that \delta_j can be rewritten as 
 \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
 * (t_j^{(m_i)} - o_j^{(m_i)})
 The above equation indicates that the \delta_j can be divided into k parts.
 So for the implementation, each mapper can calculate part of \delta_j with 
 

[jira] [Updated] (MAHOUT-1265) Add Multilayer Perceptron

2013-12-07 Thread Yexi Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yexi Jiang updated MAHOUT-1265:
---

Attachment: Mahout-1265-6.patch

This is the final version of the patch. It has been reviewed by [~smarthi].

 Add Multilayer Perceptron 
 --

 Key: MAHOUT-1265
 URL: https://issues.apache.org/jira/browse/MAHOUT-1265
 Project: Mahout
  Issue Type: New Feature
Reporter: Yexi Jiang
  Labels: machine_learning, neural_network
 Attachments: Mahout-1265-6.patch, mahout-1265.patch


 Design of multilayer perceptron
 1. Motivation
 A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
 network, which is a mathematical model inspired by the biological neural 
 network. The multilayer perceptron can be used for various machine learning 
 tasks such as classification and regression. It is helpful if it can be 
 included in mahout.
 2. API
 The design goal of API is to facilitate the usage of MLP for user, and make 
 the implementation detail user transparent.
 The following is an example code of how user uses the MLP.
 -
 //  set the parameters
 double learningRate = 0.5;
 double momentum = 0.1;
 int[] layerSizeArray = new int[] {2, 5, 1};
 String costFuncName = “SquaredError”;
 String squashingFuncName = “Sigmoid”;
 //  the location to store the model, if there is already an existing model at 
 the specified location, MLP will throw exception
 URI modelLocation = ...
 MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
 modelLocation);
 mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
 //  the user can also load an existing model with given URI and update the 
 model with new training data, if there is no existing model at the specified 
 location, an exception will be thrown
 /*
 MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
 regularization, momentum, squashingFuncName, costFuncName, modelLocation);
 */
 URI trainingDataLocation = …
 //  the detail of training is transparent to the user, it may running in a 
 single machine or in a distributed environment
 mlp.train(trainingDataLocation);
 //  user can also train the model with one training instance in stochastic 
 gradient descent way
 Vector trainingInstance = ...
 mlp.train(trainingInstance);
 //  prepare the input feature
 Vector inputFeature …
 //  the semantic meaning of the output result is defined by the user
 //  in general case, the dimension of output vector is 1 for regression and 
 two-class classification
 //  the dimension of output vector is n for n-class classification (n  2)
 Vector outputVector = mlp.output(inputFeature); 
 -
 3. Methodology
 The output calculation can be easily implemented with feed-forward approach. 
 Also, the single machine training is straightforward. The following will 
 describe how to train MLP in distributed way with batch gradient descent. The 
 workflow is illustrated as the below figure.
 https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960h=720
 For the distributed training, each training iteration is divided into two 
 steps, the weight update calculation step and the weight update step. The 
 distributed MLP can only be trained in batch-update approach.
 3.1 The partial weight update calculation step:
 This step trains the MLP distributedly. Each task will get a copy of the MLP 
 model, and calculate the weight update with a partition of data.
 Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
 D denotes the training set, d denotes a training instance, t_d denotes the 
 class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
 function is used as the squashing function, 
 squared error is used as the cost function, 
 t_i denotes the target value for the ith dimension of the output layer, 
 o_i denotes the actual output for the ith dimension of the output layer, 
 l denotes the learning rate,
 w_{ij} denotes the weight between the jth neuron in previous layer and the 
 ith neuron in the next layer. 
 The weight of each edge is updated as 
 \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
 where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
 o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
 o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
 It is easy to know that \delta_j can be rewritten as 
 \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
 * (t_j^{(m_i)} - o_j^{(m_i)})
 The above equation indicates that the \delta_j can be divided into k parts.
 So for the implementation, each mapper can calculate part of \delta_j with 
 given partition of data, 

[jira] [Updated] (MAHOUT-1265) Add Multilayer Perceptron

2013-08-03 Thread Yexi Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yexi Jiang updated MAHOUT-1265:
---

Attachment: mahout-1265.patch

[~tdunning] I have finished a workable single machine version 
MultilayerPerceptron (based on NeuralNetwork). It supports the requirement as 
you mentioned above. It allow users to customize each layer including the size 
and the squashing function. Also, it allows users to specify different loss 
functions to the model. Moreover, it allow user to store the trained model and 
reload it for later use. Finally, it allows users to extract the weight of each 
layer from a trained model. This approach allows users to train and stack a 
deep learning neural network layer by layer. If this single machine version 
passes the review, I will begin to work on the map-reduce version base on it.

 Add Multilayer Perceptron 
 --

 Key: MAHOUT-1265
 URL: https://issues.apache.org/jira/browse/MAHOUT-1265
 Project: Mahout
  Issue Type: New Feature
Reporter: Yexi Jiang
  Labels: machine_learning, neural_network
 Attachments: mahout-1265.patch


 Design of multilayer perceptron
 1. Motivation
 A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
 network, which is a mathematical model inspired by the biological neural 
 network. The multilayer perceptron can be used for various machine learning 
 tasks such as classification and regression. It is helpful if it can be 
 included in mahout.
 2. API
 The design goal of API is to facilitate the usage of MLP for user, and make 
 the implementation detail user transparent.
 The following is an example code of how user uses the MLP.
 -
 //  set the parameters
 double learningRate = 0.5;
 double momentum = 0.1;
 int[] layerSizeArray = new int[] {2, 5, 1};
 String costFuncName = “SquaredError”;
 String squashingFuncName = “Sigmoid”;
 //  the location to store the model, if there is already an existing model at 
 the specified location, MLP will throw exception
 URI modelLocation = ...
 MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
 modelLocation);
 mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
 //  the user can also load an existing model with given URI and update the 
 model with new training data, if there is no existing model at the specified 
 location, an exception will be thrown
 /*
 MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
 regularization, momentum, squashingFuncName, costFuncName, modelLocation);
 */
 URI trainingDataLocation = …
 //  the detail of training is transparent to the user, it may running in a 
 single machine or in a distributed environment
 mlp.train(trainingDataLocation);
 //  user can also train the model with one training instance in stochastic 
 gradient descent way
 Vector trainingInstance = ...
 mlp.train(trainingInstance);
 //  prepare the input feature
 Vector inputFeature …
 //  the semantic meaning of the output result is defined by the user
 //  in general case, the dimension of output vector is 1 for regression and 
 two-class classification
 //  the dimension of output vector is n for n-class classification (n  2)
 Vector outputVector = mlp.output(inputFeature); 
 -
 3. Methodology
 The output calculation can be easily implemented with feed-forward approach. 
 Also, the single machine training is straightforward. The following will 
 describe how to train MLP in distributed way with batch gradient descent. The 
 workflow is illustrated as the below figure.
 https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960h=720
 For the distributed training, each training iteration is divided into two 
 steps, the weight update calculation step and the weight update step. The 
 distributed MLP can only be trained in batch-update approach.
 3.1 The partial weight update calculation step:
 This step trains the MLP distributedly. Each task will get a copy of the MLP 
 model, and calculate the weight update with a partition of data.
 Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
 D denotes the training set, d denotes a training instance, t_d denotes the 
 class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
 function is used as the squashing function, 
 squared error is used as the cost function, 
 t_i denotes the target value for the ith dimension of the output layer, 
 o_i denotes the actual output for the ith dimension of the output layer, 
 l denotes the learning rate,
 w_{ij} denotes the weight between the jth neuron in previous layer and the 
 ith neuron in the next layer. 
 The weight of each edge is 

[jira] [Updated] (MAHOUT-1265) Add Multilayer Perceptron

2013-08-03 Thread Yexi Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yexi Jiang updated MAHOUT-1265:
---

Status: Patch Available  (was: Open)

 Add Multilayer Perceptron 
 --

 Key: MAHOUT-1265
 URL: https://issues.apache.org/jira/browse/MAHOUT-1265
 Project: Mahout
  Issue Type: New Feature
Reporter: Yexi Jiang
  Labels: machine_learning, neural_network
 Attachments: mahout-1265.patch


 Design of multilayer perceptron
 1. Motivation
 A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
 network, which is a mathematical model inspired by the biological neural 
 network. The multilayer perceptron can be used for various machine learning 
 tasks such as classification and regression. It is helpful if it can be 
 included in mahout.
 2. API
 The design goal of API is to facilitate the usage of MLP for user, and make 
 the implementation detail user transparent.
 The following is an example code of how user uses the MLP.
 -
 //  set the parameters
 double learningRate = 0.5;
 double momentum = 0.1;
 int[] layerSizeArray = new int[] {2, 5, 1};
 String costFuncName = “SquaredError”;
 String squashingFuncName = “Sigmoid”;
 //  the location to store the model, if there is already an existing model at 
 the specified location, MLP will throw exception
 URI modelLocation = ...
 MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
 modelLocation);
 mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
 //  the user can also load an existing model with given URI and update the 
 model with new training data, if there is no existing model at the specified 
 location, an exception will be thrown
 /*
 MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
 regularization, momentum, squashingFuncName, costFuncName, modelLocation);
 */
 URI trainingDataLocation = …
 //  the detail of training is transparent to the user, it may running in a 
 single machine or in a distributed environment
 mlp.train(trainingDataLocation);
 //  user can also train the model with one training instance in stochastic 
 gradient descent way
 Vector trainingInstance = ...
 mlp.train(trainingInstance);
 //  prepare the input feature
 Vector inputFeature …
 //  the semantic meaning of the output result is defined by the user
 //  in general case, the dimension of output vector is 1 for regression and 
 two-class classification
 //  the dimension of output vector is n for n-class classification (n  2)
 Vector outputVector = mlp.output(inputFeature); 
 -
 3. Methodology
 The output calculation can be easily implemented with feed-forward approach. 
 Also, the single machine training is straightforward. The following will 
 describe how to train MLP in distributed way with batch gradient descent. The 
 workflow is illustrated as the below figure.
 https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960h=720
 For the distributed training, each training iteration is divided into two 
 steps, the weight update calculation step and the weight update step. The 
 distributed MLP can only be trained in batch-update approach.
 3.1 The partial weight update calculation step:
 This step trains the MLP distributedly. Each task will get a copy of the MLP 
 model, and calculate the weight update with a partition of data.
 Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
 D denotes the training set, d denotes a training instance, t_d denotes the 
 class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
 function is used as the squashing function, 
 squared error is used as the cost function, 
 t_i denotes the target value for the ith dimension of the output layer, 
 o_i denotes the actual output for the ith dimension of the output layer, 
 l denotes the learning rate,
 w_{ij} denotes the weight between the jth neuron in previous layer and the 
 ith neuron in the next layer. 
 The weight of each edge is updated as 
 \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
 where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
 o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
 o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
 It is easy to know that \delta_j can be rewritten as 
 \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
 * (t_j^{(m_i)} - o_j^{(m_i)})
 The above equation indicates that the \delta_j can be divided into k parts.
 So for the implementation, each mapper can calculate part of \delta_j with 
 given partition of data, and then store the result into a specified location.
 3.2 The model update step:
 After k 

[jira] [Updated] (MAHOUT-1265) Add Multilayer Perceptron

2013-06-18 Thread Yexi Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yexi Jiang updated MAHOUT-1265:
---

Description: 
Design of multilayer perceptron


1. Motivation
A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
network, which is a mathematical model inspired by the biological neural 
network. The multilayer perceptron can be used for various machine learning 
tasks such as classification and regression. It is helpful if it can be 
included in mahout.

2. API

The design goal of API is to facilitate the usage of MLP for user, and make the 
implementation detail user transparent.

The following is an example code of how user uses the MLP.
-
//  set the parameters
double learningRate = 0.5;
double momentum = 0.1;
int[] layerSizeArray = new int[] {2, 5, 1};
String costFuncName = “SquaredError”;
String squashingFuncName = “Sigmoid”;
//  the location to store the model, if there is already an existing model at 
the specified location, MLP will throw exception
URI modelLocation = ...
MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
modelLocation);
mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);

//  the user can also load an existing model with given URI and update the 
model with new training data, if there is no existing model at the specified 
location, an exception will be thrown
/*
MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
regularization, momentum, squashingFuncName, costFuncName, modelLocation);
*/

URI trainingDataLocation = …
//  the detail of training is transparent to the user, it may running in a 
single machine or in a distributed environment
mlp.train(trainingDataLocation);

//  user can also train the model with one training instance in stochastic 
gradient descent way
Vector trainingInstance = ...
mlp.train(trainingInstance);

//  prepare the input feature
Vector inputFeature …
//  the semantic meaning of the output result is defined by the user
//  in general case, the dimension of output vector is 1 for regression and 
two-class classification
//  the dimension of output vector is n for n-class classification (n  2)
Vector outputVector = mlp.output(inputFeature); 
-


3. Methodology

The output calculation can be easily implemented with feed-forward approach. 
Also, the single machine training is straightforward. The following will 
describe how to train MLP in distributed way with batch gradient descent. The 
workflow is illustrated as the below figure.


https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960h=720

For the distributed training, each training iteration is divided into two 
steps, the weight update calculation step and the weight update step. The 
distributed MLP can only be trained in batch-update approach.


3.1 The partial weight update calculation step:
This step trains the MLP distributedly. Each task will get a copy of the MLP 
model, and calculate the weight update with a partition of data.

Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where D 
denotes the training set, d denotes a training instance, t_d denotes the class 
label and y_d denotes the output of the MLP. Also, suppose sigmoid function is 
used as the squashing function, 
squared error is used as the cost function, 
t_i denotes the target value for the ith dimension of the output layer, 
o_i denotes the actual output for the ith dimension of the output layer, 
l denotes the learning rate,
w_{ij} denotes the weight between the jth neuron in previous layer and the ith 
neuron in the next layer. 

The weight of each edge is updated as 

\Delta w_{ij} = l * 1 / m * \delta_j * o_i, 

where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 

It is easy to know that \delta_j can be rewritten as 

\delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) * 
(t_j^{(m_i)} - o_j^{(m_i)})

The above equation indicates that the \delta_j can be divided into k parts.

So for the implementation, each mapper can calculate part of \delta_j with 
given partition of data, and then store the result into a specified location.


3.2 The model update step:

After k parts of \delta_j been calculated, a separate program can be used to 
merge the k parts of \delta_j into one to update the weight matrices.

This program can load the results calculated in the weight update calculation 
step and update the weight matrices. 


  was:
Design of multilayer perceptron


1. Motivation
A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
network, which is a mathematical model inspired