[jira] [Commented] (MAHOUT-1300) Support for easy functional matrix views and some of their derivatives

2013-08-07 Thread Dmitriy Lyubimov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733190#comment-13733190
 ] 

Dmitriy Lyubimov commented on MAHOUT-1300:
--

I tried to create a review request for this, but it doesn't seem to work 
(breaks down) even for mahout-git with an error message i don't understand. I 
guess anyone interested, please review the patches. 

in particular, i am dubious about the names for random matrix views in the 
Matrices class. Not sure if they good enough, but something long like 
"uniformSymmetricRandomMatrixView()" is probably too long.

transposedView is the main feature of this (per discussion with Ted). it 
eliminates overhead for thinggs like A.t %*% A (also changes to A would 
propagate to A.t)

> Support for easy functional matrix views and some of their derivatives
> --
>
> Key: MAHOUT-1300
> URL: https://issues.apache.org/jira/browse/MAHOUT-1300
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 0.9
>
> Attachments: MAHOUT-1300.patch, MAHOUT-1300.patch.1, 
> MAHOUT-1300.patch.2
>
>
> Support for easy matrix views based on (Int,Int)=>Double function. 
> Current derived views: 
> (1) general functional view
> (2) transposed matrix view
> (3) uniform matrix view (based on function composition over symmetric uniform)
> (4) symmetric uniform matrix view (based on murmur64)
> (5) random gaussian matrix view.
> I know that there's a trinary random matrix as well which could be scripted 
> out as a view as well (methinks), as well as Omega thing in distributed SSVD 
> which also perhaps could be replaced by a symmetric uniform view.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAHOUT-1306) SSVD-PCA results mangled if -ow (overwrite) is requested

2013-08-07 Thread Dmitriy Lyubimov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Lyubimov updated MAHOUT-1306:
-

Attachment: MAHOUT-1306.patch

> SSVD-PCA results mangled if -ow (overwrite) is requested
> 
>
> Key: MAHOUT-1306
> URL: https://issues.apache.org/jira/browse/MAHOUT-1306
> Project: Mahout
>  Issue Type: Bug
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Attachments: MAHOUT-1306.patch
>
>
> Seems some PCA related vectors are wiped by incorrect application of 
> directory cleanup when -ow is given. 
> I will also take this opportunity to do some architectural and test cleanup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAHOUT-1306) SSVD-PCA results mangled if -ow (overwrite) is requested

2013-08-07 Thread Dmitriy Lyubimov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Lyubimov updated MAHOUT-1306:
-

Affects Version/s: 0.8
Fix Version/s: 0.9

> SSVD-PCA results mangled if -ow (overwrite) is requested
> 
>
> Key: MAHOUT-1306
> URL: https://issues.apache.org/jira/browse/MAHOUT-1306
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 0.9
>
> Attachments: MAHOUT-1306.patch
>
>
> Seems some PCA related vectors are wiped by incorrect application of 
> directory cleanup when -ow is given. 
> I will also take this opportunity to do some architectural and test cleanup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAHOUT-1306) SSVD-PCA results mangled if -ow (overwrite) is requested

2013-08-07 Thread Dmitriy Lyubimov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Lyubimov updated MAHOUT-1306:
-

Status: Patch Available  (was: Open)

> SSVD-PCA results mangled if -ow (overwrite) is requested
> 
>
> Key: MAHOUT-1306
> URL: https://issues.apache.org/jira/browse/MAHOUT-1306
> Project: Mahout
>  Issue Type: Bug
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Attachments: MAHOUT-1306.patch
>
>
> Seems some PCA related vectors are wiped by incorrect application of 
> directory cleanup when -ow is given. 
> I will also take this opportunity to do some architectural and test cleanup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAHOUT-1306) SSVD-PCA results mangled if -ow (overwrite) is requested

2013-08-07 Thread Dmitriy Lyubimov (JIRA)
Dmitriy Lyubimov created MAHOUT-1306:


 Summary: SSVD-PCA results mangled if -ow (overwrite) is requested
 Key: MAHOUT-1306
 URL: https://issues.apache.org/jira/browse/MAHOUT-1306
 Project: Mahout
  Issue Type: Bug
Reporter: Dmitriy Lyubimov
Assignee: Dmitriy Lyubimov


Seems some PCA related vectors are wiped by incorrect application of directory 
cleanup when -ow is given. 

I will also take this opportunity to do some architectural and test cleanup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAHOUT-1300) Support for easy functional matrix views and some of their derivatives

2013-08-07 Thread Dmitriy Lyubimov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Lyubimov updated MAHOUT-1300:
-

Status: Patch Available  (was: Open)

> Support for easy functional matrix views and some of their derivatives
> --
>
> Key: MAHOUT-1300
> URL: https://issues.apache.org/jira/browse/MAHOUT-1300
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 0.9
>
> Attachments: MAHOUT-1300.patch, MAHOUT-1300.patch.1, 
> MAHOUT-1300.patch.2
>
>
> Support for easy matrix views based on (Int,Int)=>Double function. 
> Current derived views: 
> (1) general functional view
> (2) transposed matrix view
> (3) uniform matrix view (based on function composition over symmetric uniform)
> (4) symmetric uniform matrix view (based on murmur64)
> (5) random gaussian matrix view.
> I know that there's a trinary random matrix as well which could be scripted 
> out as a view as well (methinks), as well as Omega thing in distributed SSVD 
> which also perhaps could be replaced by a symmetric uniform view.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAHOUT-1265) Add Multilayer Perceptron

2013-08-07 Thread Yexi Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733110#comment-13733110
 ] 

Yexi Jiang commented on MAHOUT-1265:


[~smarthi] Done, please refer to https://reviews.apache.org/r/13406/. Thank you.

> Add Multilayer Perceptron 
> --
>
> Key: MAHOUT-1265
> URL: https://issues.apache.org/jira/browse/MAHOUT-1265
> Project: Mahout
>  Issue Type: New Feature
>Reporter: Yexi Jiang
>  Labels: machine_learning, neural_network
> Attachments: mahout-1265.patch
>
>
> Design of multilayer perceptron
> 1. Motivation
> A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
> network, which is a mathematical model inspired by the biological neural 
> network. The multilayer perceptron can be used for various machine learning 
> tasks such as classification and regression. It is helpful if it can be 
> included in mahout.
> 2. API
> The design goal of API is to facilitate the usage of MLP for user, and make 
> the implementation detail user transparent.
> The following is an example code of how user uses the MLP.
> -
> //  set the parameters
> double learningRate = 0.5;
> double momentum = 0.1;
> int[] layerSizeArray = new int[] {2, 5, 1};
> String costFuncName = “SquaredError”;
> String squashingFuncName = “Sigmoid”;
> //  the location to store the model, if there is already an existing model at 
> the specified location, MLP will throw exception
> URI modelLocation = ...
> MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
> modelLocation);
> mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
> //  the user can also load an existing model with given URI and update the 
> model with new training data, if there is no existing model at the specified 
> location, an exception will be thrown
> /*
> MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
> regularization, momentum, squashingFuncName, costFuncName, modelLocation);
> */
> URI trainingDataLocation = …
> //  the detail of training is transparent to the user, it may running in a 
> single machine or in a distributed environment
> mlp.train(trainingDataLocation);
> //  user can also train the model with one training instance in stochastic 
> gradient descent way
> Vector trainingInstance = ...
> mlp.train(trainingInstance);
> //  prepare the input feature
> Vector inputFeature …
> //  the semantic meaning of the output result is defined by the user
> //  in general case, the dimension of output vector is 1 for regression and 
> two-class classification
> //  the dimension of output vector is n for n-class classification (n > 2)
> Vector outputVector = mlp.output(inputFeature); 
> -
> 3. Methodology
> The output calculation can be easily implemented with feed-forward approach. 
> Also, the single machine training is straightforward. The following will 
> describe how to train MLP in distributed way with batch gradient descent. The 
> workflow is illustrated as the below figure.
> https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960&h=720
> For the distributed training, each training iteration is divided into two 
> steps, the weight update calculation step and the weight update step. The 
> distributed MLP can only be trained in batch-update approach.
> 3.1 The partial weight update calculation step:
> This step trains the MLP distributedly. Each task will get a copy of the MLP 
> model, and calculate the weight update with a partition of data.
> Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
> D denotes the training set, d denotes a training instance, t_d denotes the 
> class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
> function is used as the squashing function, 
> squared error is used as the cost function, 
> t_i denotes the target value for the ith dimension of the output layer, 
> o_i denotes the actual output for the ith dimension of the output layer, 
> l denotes the learning rate,
> w_{ij} denotes the weight between the jth neuron in previous layer and the 
> ith neuron in the next layer. 
> The weight of each edge is updated as 
> \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
> where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
> o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
> o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
> It is easy to know that \delta_j can be rewritten as 
> \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
> * (t_j^{(m_i)} - o_j^{(m_i)})
> The above equation indicates that the \delta_j can be divided into k 

[jira] [Commented] (MAHOUT-1265) Add Multilayer Perceptron

2013-08-07 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733101#comment-13733101
 ] 

Suneel Marthi commented on MAHOUT-1265:
---

[~yxjiang] Could you upload this to ReviewBoard? Its easier to review and 
comment on the code that way.

https://reviews.apache.org



> Add Multilayer Perceptron 
> --
>
> Key: MAHOUT-1265
> URL: https://issues.apache.org/jira/browse/MAHOUT-1265
> Project: Mahout
>  Issue Type: New Feature
>Reporter: Yexi Jiang
>  Labels: machine_learning, neural_network
> Attachments: mahout-1265.patch
>
>
> Design of multilayer perceptron
> 1. Motivation
> A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
> network, which is a mathematical model inspired by the biological neural 
> network. The multilayer perceptron can be used for various machine learning 
> tasks such as classification and regression. It is helpful if it can be 
> included in mahout.
> 2. API
> The design goal of API is to facilitate the usage of MLP for user, and make 
> the implementation detail user transparent.
> The following is an example code of how user uses the MLP.
> -
> //  set the parameters
> double learningRate = 0.5;
> double momentum = 0.1;
> int[] layerSizeArray = new int[] {2, 5, 1};
> String costFuncName = “SquaredError”;
> String squashingFuncName = “Sigmoid”;
> //  the location to store the model, if there is already an existing model at 
> the specified location, MLP will throw exception
> URI modelLocation = ...
> MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
> modelLocation);
> mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
> //  the user can also load an existing model with given URI and update the 
> model with new training data, if there is no existing model at the specified 
> location, an exception will be thrown
> /*
> MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
> regularization, momentum, squashingFuncName, costFuncName, modelLocation);
> */
> URI trainingDataLocation = …
> //  the detail of training is transparent to the user, it may running in a 
> single machine or in a distributed environment
> mlp.train(trainingDataLocation);
> //  user can also train the model with one training instance in stochastic 
> gradient descent way
> Vector trainingInstance = ...
> mlp.train(trainingInstance);
> //  prepare the input feature
> Vector inputFeature …
> //  the semantic meaning of the output result is defined by the user
> //  in general case, the dimension of output vector is 1 for regression and 
> two-class classification
> //  the dimension of output vector is n for n-class classification (n > 2)
> Vector outputVector = mlp.output(inputFeature); 
> -
> 3. Methodology
> The output calculation can be easily implemented with feed-forward approach. 
> Also, the single machine training is straightforward. The following will 
> describe how to train MLP in distributed way with batch gradient descent. The 
> workflow is illustrated as the below figure.
> https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960&h=720
> For the distributed training, each training iteration is divided into two 
> steps, the weight update calculation step and the weight update step. The 
> distributed MLP can only be trained in batch-update approach.
> 3.1 The partial weight update calculation step:
> This step trains the MLP distributedly. Each task will get a copy of the MLP 
> model, and calculate the weight update with a partition of data.
> Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
> D denotes the training set, d denotes a training instance, t_d denotes the 
> class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
> function is used as the squashing function, 
> squared error is used as the cost function, 
> t_i denotes the target value for the ith dimension of the output layer, 
> o_i denotes the actual output for the ith dimension of the output layer, 
> l denotes the learning rate,
> w_{ij} denotes the weight between the jth neuron in previous layer and the 
> ith neuron in the next layer. 
> The weight of each edge is updated as 
> \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
> where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
> o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
> o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
> It is easy to know that \delta_j can be rewritten as 
> \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
> * (t_j^{(m_i)} - o_j^{(m_i)})
> The above

Build failed in Jenkins: mahout-nightly #1315

2013-08-07 Thread Apache Jenkins Server
See 

--
[...truncated 776 lines...]
Running org.apache.mahout.math.jet.random.NormalTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.203 sec - in 
org.apache.mahout.math.jet.random.NormalTest
Running org.apache.mahout.math.VectorTest
Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.088 sec - in 
org.apache.mahout.math.VectorTest
Running org.apache.mahout.math.ssvd.SequentialBigSvdTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.078 sec - in 
org.apache.mahout.math.ssvd.SequentialBigSvdTest
Running org.apache.mahout.math.solver.EigenDecompositionTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.088 sec - in 
org.apache.mahout.math.solver.EigenDecompositionTest
Running org.apache.mahout.math.solver.TestConjugateGradientSolver
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.118 sec - in 
org.apache.mahout.math.solver.TestConjugateGradientSolver
Running org.apache.mahout.math.solver.LSMRTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.205 sec - in 
org.apache.mahout.math.solver.LSMRTest
Running org.apache.mahout.math.DenseSymmetricTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.054 sec - in 
org.apache.mahout.math.DenseSymmetricTest
Running org.apache.mahout.math.FileBasedMatrixTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 2.247 sec - in 
org.apache.mahout.math.FileBasedMatrixTest
Running org.apache.mahout.math.CentroidTest
Tests run: 43, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.676 sec - in 
org.apache.mahout.math.CentroidTest
Running org.apache.mahout.math.VectorBinaryAssignCostTest
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.102 sec - in 
org.apache.mahout.math.VectorBinaryAssignCostTest
Running org.apache.mahout.math.DiagonalMatrixTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.049 sec - in 
org.apache.mahout.math.DiagonalMatrixTest
Running org.apache.mahout.math.OldQRDecompositionTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.101 sec - in 
org.apache.mahout.math.OldQRDecompositionTest
Running org.apache.mahout.math.TestSparseColumnMatrix
Tests run: 59, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.122 sec - in 
org.apache.mahout.math.TestSparseColumnMatrix
Running org.apache.mahout.math.CholeskyDecompositionTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.376 sec - in 
org.apache.mahout.math.CholeskyDecompositionTest
Running org.apache.mahout.math.list.ShortArrayListTest
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.012 sec - in 
org.apache.mahout.math.list.ShortArrayListTest
Running org.apache.mahout.math.list.ByteArrayListTest
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.011 sec - in 
org.apache.mahout.math.list.ByteArrayListTest
Running org.apache.mahout.math.list.LongArrayListTest
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.011 sec - in 
org.apache.mahout.math.list.LongArrayListTest
Running org.apache.mahout.math.list.CharArrayListTest
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.013 sec - in 
org.apache.mahout.math.list.CharArrayListTest
Running org.apache.mahout.math.list.FloatArrayListTest
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.011 sec - in 
org.apache.mahout.math.list.FloatArrayListTest
Running org.apache.mahout.math.list.DoubleArrayListTest
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.011 sec - in 
org.apache.mahout.math.list.DoubleArrayListTest
Running org.apache.mahout.math.list.ObjectArrayListTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.007 sec - in 
org.apache.mahout.math.list.ObjectArrayListTest
Running org.apache.mahout.math.list.IntArrayListTest
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.011 sec - in 
org.apache.mahout.math.list.IntArrayListTest
Running org.apache.mahout.math.TestDenseVector
Tests run: 43, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.949 sec - in 
org.apache.mahout.math.TestDenseVector
Running org.apache.mahout.math.FileBasedSparseBinaryMatrixTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.098 sec - in 
org.apache.mahout.math.FileBasedSparseBinaryMatrixTest
Running org.apache.mahout.math.stats.OnlineSummarizerTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.047 sec - in 
org.apache.mahout.math.stats.OnlineSummarizerTest
Running org.apache.mahout.math.stats.OnlineExponentialAverageTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.02 sec - in 
org.apache.mahout.math.stats.OnlineExponentialAverageTest
Running org.apache.mahout.math.stats.LogLikelihoodTest
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.089 sec - in 
org.apache.mahout.math.

Jenkins build is back to normal : Mahout-Quality #2188

2013-08-07 Thread Apache Jenkins Server
See 



[jira] [Commented] (MAHOUT-1265) Add Multilayer Perceptron

2013-08-07 Thread Yexi Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732836#comment-13732836
 ] 

Yexi Jiang commented on MAHOUT-1265:


Is there any who can review the code?
The sample code for using it can be seen in test cases.

[~tdunning] Could you please give any comments?

> Add Multilayer Perceptron 
> --
>
> Key: MAHOUT-1265
> URL: https://issues.apache.org/jira/browse/MAHOUT-1265
> Project: Mahout
>  Issue Type: New Feature
>Reporter: Yexi Jiang
>  Labels: machine_learning, neural_network
> Attachments: mahout-1265.patch
>
>
> Design of multilayer perceptron
> 1. Motivation
> A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
> network, which is a mathematical model inspired by the biological neural 
> network. The multilayer perceptron can be used for various machine learning 
> tasks such as classification and regression. It is helpful if it can be 
> included in mahout.
> 2. API
> The design goal of API is to facilitate the usage of MLP for user, and make 
> the implementation detail user transparent.
> The following is an example code of how user uses the MLP.
> -
> //  set the parameters
> double learningRate = 0.5;
> double momentum = 0.1;
> int[] layerSizeArray = new int[] {2, 5, 1};
> String costFuncName = “SquaredError”;
> String squashingFuncName = “Sigmoid”;
> //  the location to store the model, if there is already an existing model at 
> the specified location, MLP will throw exception
> URI modelLocation = ...
> MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
> modelLocation);
> mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
> //  the user can also load an existing model with given URI and update the 
> model with new training data, if there is no existing model at the specified 
> location, an exception will be thrown
> /*
> MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
> regularization, momentum, squashingFuncName, costFuncName, modelLocation);
> */
> URI trainingDataLocation = …
> //  the detail of training is transparent to the user, it may running in a 
> single machine or in a distributed environment
> mlp.train(trainingDataLocation);
> //  user can also train the model with one training instance in stochastic 
> gradient descent way
> Vector trainingInstance = ...
> mlp.train(trainingInstance);
> //  prepare the input feature
> Vector inputFeature …
> //  the semantic meaning of the output result is defined by the user
> //  in general case, the dimension of output vector is 1 for regression and 
> two-class classification
> //  the dimension of output vector is n for n-class classification (n > 2)
> Vector outputVector = mlp.output(inputFeature); 
> -
> 3. Methodology
> The output calculation can be easily implemented with feed-forward approach. 
> Also, the single machine training is straightforward. The following will 
> describe how to train MLP in distributed way with batch gradient descent. The 
> workflow is illustrated as the below figure.
> https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960&h=720
> For the distributed training, each training iteration is divided into two 
> steps, the weight update calculation step and the weight update step. The 
> distributed MLP can only be trained in batch-update approach.
> 3.1 The partial weight update calculation step:
> This step trains the MLP distributedly. Each task will get a copy of the MLP 
> model, and calculate the weight update with a partition of data.
> Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
> D denotes the training set, d denotes a training instance, t_d denotes the 
> class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
> function is used as the squashing function, 
> squared error is used as the cost function, 
> t_i denotes the target value for the ith dimension of the output layer, 
> o_i denotes the actual output for the ith dimension of the output layer, 
> l denotes the learning rate,
> w_{ij} denotes the weight between the jth neuron in previous layer and the 
> ith neuron in the next layer. 
> The weight of each edge is updated as 
> \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
> where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
> o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
> o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
> It is easy to know that \delta_j can be rewritten as 
> \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
> * (t_j^{(m_i)} - o_j^{(m_i)})
> The abo

[jira] [Created] (MAHOUT-1305) Rework the wiki

2013-08-07 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1305:
--

 Summary: Rework the wiki
 Key: MAHOUT-1305
 URL: https://issues.apache.org/jira/browse/MAHOUT-1305
 Project: Mahout
  Issue Type: Bug
  Components: Website
Reporter: Sebastian Schelter
Priority: Blocker
 Fix For: 0.9


We should think about completely redoing our wiki. At the moment, we're listing 
lots of algorithms that we either never implemented or already removed. I also 
have the impression that a lot of stuff is outdated.

It would be awesome if we had an up-to-date documentation of the code with 
instructions on how to get into using mahout quickly.

We should also have examples for all our 3 C's.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAHOUT-1304) Website doesn't fit on 1280 px

2013-08-07 Thread Sebastian Schelter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated MAHOUT-1304:
---

Attachment: screen.png

> Website doesn't fit on 1280 px
> --
>
> Key: MAHOUT-1304
> URL: https://issues.apache.org/jira/browse/MAHOUT-1304
> Project: Mahout
>  Issue Type: Bug
>  Components: Website
>Reporter: Sebastian Schelter
> Fix For: 0.9
>
> Attachments: screen.png
>
>
> Hi,
> since the latest changes, our website doesn't fit onto 1280 anymore it seems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAHOUT-1304) Website doesn't fit on 1280 px

2013-08-07 Thread Sebastian Schelter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated MAHOUT-1304:
---

Fix Version/s: 0.9

> Website doesn't fit on 1280 px
> --
>
> Key: MAHOUT-1304
> URL: https://issues.apache.org/jira/browse/MAHOUT-1304
> Project: Mahout
>  Issue Type: Bug
>  Components: Website
>Reporter: Sebastian Schelter
> Fix For: 0.9
>
> Attachments: screen.png
>
>
> Hi,
> since the latest changes, our website doesn't fit onto 1280 anymore it seems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAHOUT-1304) Website doesn't fit on 1280 px

2013-08-07 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1304:
--

 Summary: Website doesn't fit on 1280 px
 Key: MAHOUT-1304
 URL: https://issues.apache.org/jira/browse/MAHOUT-1304
 Project: Mahout
  Issue Type: Bug
  Components: Website
Reporter: Sebastian Schelter


Hi,

since the latest changes, our website doesn't fit onto 1280 anymore it seems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Hangout on Monday

2013-08-07 Thread Ted Dunning
On Wed, Aug 7, 2013 at 5:29 AM, Grant Ingersoll  wrote:

> > Can you say something about how to hijack some of those?
>
> Doodle just allows you to capture when people are most available.  I
> believe if you check the link sent earlier, you can see when they are and
> how people voted.  Otherwise, I can dig it up.
>

That I understand.

The question was how to capture a URL for the hangout.  Suneel helped with
that.


Re: Hangout on Monday

2013-08-07 Thread Grant Ingersoll

On Aug 5, 2013, at 9:30 PM, Ted Dunning  wrote:

> On Mon, Aug 5, 2013 at 5:21 PM, Suneel Marthi wrote:
> 
>> Grant had setup a biweekly/weekly Google Doodle for Mahout meetups.
>> 
> 
> Can you say something about how to hijack some of those?

Doodle just allows you to capture when people are most available.  I believe if 
you check the link sent earlier, you can see when they are and how people 
voted.  Otherwise, I can dig it up.

-Grant