[
https://issues.apache.org/jira/browse/MAHOUT-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036832#comment-13036832
]
XiaoboGu commented on MAHOUT-696:
---------------------------------
I add a scores option in 696-r2.patch, but I don't know whether this make
sense, I just copy the code from TrainLogistic.
There is another question, because there are concurrent threads training the
examples, will the scores option cause concurrent performance problems, because
the main thread will read and convert csv records into Vectors, will it become
a bottleneck ?
I copy the unfinished main function of my TrainAdaptiveLogistic class here for
your reference:
public static void main(String[] args) throws IOException {
if (parseArgs(args)) {
double logPEstimate = 0;
int k = 0;
CsvRecordFactory csv = lmp.getCsvRecordFactory();
AdaptiveLogisticRegression lr = lmp
.createAdaptiveLogisticRegression();
for (int pass = 0; pass < passes; pass++) {
BufferedReader in = open(inputFile);
// read variable names
csv.firstLine(in.readLine());
String line = in.readLine();
while (line != null) {
// for each new line, get target and
predictors
Vector input = new
RandomAccessSparseVector(lmp.getNumFeatures());
int targetValue = csv.processLine(line,
input);
// update model
lr.train(targetValue, input);
k ++;
if (scores && (k % (skipscorenum + 1)
== 0) ) {
State<Wrapper,
CrossFoldLearner> best = lr.getBest();
CrossFoldLearner learner = null;
if (null != best) {
learner =
best.getPayload().getLearner();
}
if (learner != null) {
// check performance while this
is still news
double logP =
learner.logLikelihood(targetValue, input);
if (!Double.isInfinite(logP)) {
if (k < 20) {
logPEstimate =
(k * logPEstimate + logP)
/ (k + 1);
} else {
logPEstimate =
0.95 * logPEstimate + 0.05
* logP;
}
}
double p =
learner.classifyScalar(input);
output.printf(Locale.ENGLISH,
"%10d %2d
%10.2f %2.4f %10.4f %10.4f\n",
k, targetValue,
learner.percentCorrect(), p, logP,
logPEstimate);
}else{
output.printf(Locale.ENGLISH,
"%10d
%2d %s\n", k, targetValue,
"AdaptiveLogisticRegression is not ready for scoring ... ");
}
}
line = in.readLine();
}
in.close();
}
OutputStream modelOutput = new
FileOutputStream(outputFile);
try {
lmp.saveTo(modelOutput);
} finally {
modelOutput.close();
}
output.printf(Locale.ENGLISH, "%d\n",
lmp.getNumFeatures());
output.printf(Locale.ENGLISH, "%s ~ ",
lmp.getTargetVariable());
String sep = "";
}
}
> Command line program for AdaptiveLogiscticRegression
> ----------------------------------------------------
>
> Key: MAHOUT-696
> URL: https://issues.apache.org/jira/browse/MAHOUT-696
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Affects Versions: 0.5
> Reporter: XiaoboGu
> Fix For: 0.6
>
> Attachments: mahout-696-r1.patch, mahout-696-r2.patch
>
>
> Suggested by Ted, I'll try to write a command line program for
> AdaptiveLogicticRegression, but as I am not familir with the algorithm, I'll
> try to write a prototype for the program from a Java developer's perspactive,
> hope anyone else will help with the details of the algorithm.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira