[ https://issues.apache.org/jira/browse/MAHOUT-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864531#comment-13864531 ]
Yexi Jiang commented on MAHOUT-1388: ------------------------------------ [~smarthi] When I submit the patch to the review board, I got the following error: ---------------------------------------------------------------------------- The file 'https://svn.apache.org/repos/asf/mahout/trunk/core/src/main/java/org/apache/mahout/classifier/mlp/NeuralNetwork.java' (r1556303) could not be found in the repository ---------------------------------------------------------------------------- However, I checked this url, the file exists. I'm not sure what causes this error. > Add command line support and logging for MLP > -------------------------------------------- > > Key: MAHOUT-1388 > URL: https://issues.apache.org/jira/browse/MAHOUT-1388 > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 1.0 > Reporter: Yexi Jiang > Labels: mlp, sgd > Fix For: 1.0 > > Attachments: Mahout-1388.patch > > > The user should have the ability to run the Perceptron from the command line. > There are two programs to execute MLP, the training and labeling. The first > one takes the data as input and outputs the model, the second one takes the > model and unlabeled data as input and outputs the results. > The parameters for training are as follows: > ------------------------------------------------ > --input -i (input data) > --skipHeader -sk // whether to skip the first row, this parameter is optional > --labels -labels // the labels of the instances, separated by whitespace. > Take the iris dataset for example, the labels are 'setosa versicolor > virginica'. > --model -mo // in training mode, this is the location to store the model (if > the specified location has an existing model, it will update the model > through incremental learning), in labeling mode, this is the location to > store the result > --update -u // whether to incremental update the model, if this parameter is > not given, train the model from scratch > --output -o // this is only useful in labeling mode > --layersize -ls (no. of units per hidden layer) // use whitespace separated > number to indicate the number of neurons in each layer (including input layer > and output layer), e.g. '5 3 2'. > --squashingFunction -sf // currently only supports Sigmoid > --momentum -m > --learningrate -l > --regularizationweight -r > --costfunction -cf // the type of cost function, > ------------------------------------------------ > For example, train a 3-layer (including input, hidden, and output) MLP with > 0.1 learning rate, 0.1 momentum rate, and 0.01 regularization weight, the > parameter would be: > mlp -i /tmp/training-data.csv -labels setosa versicolor virginica -o > /tmp/model.model -ls 5,3,1 -l 0.1 -m 0.1 -r 0.01 > This command would read the training data from /tmp/training-data.csv and > write the trained model to /tmp/model.model. > The parameters for labeling is as follows: > ------------------------------------------------------------- > --input -i // input file path > --columnRange -cr // the range of column used for feature, start from 0 and > separated by whitespace, e.g. 0 5 > --format -f // the format of input file, currently only supports csv > --model -mo // the file path of the model > --output -o // the output path for the results > ------------------------------------------------------------- > If a user need to use an existing model, it will use the following command: > mlp -i /tmp/unlabel-data.csv -m /tmp/model.model -o /tmp/label-result > Moreover, we should be providing default values if the user does not specify > any. -- This message was sent by Atlassian JIRA (v6.1.5#6160)