[jira] [Commented] (MAHOUT-696) Command line program for AdaptiveLogiscticRegression

XiaoboGu (JIRA) Fri, 20 May 2011 06:47:31 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036832#comment-13036832
 ]


XiaoboGu commented on MAHOUT-696:
---------------------------------

I add a scores option in 696-r2.patch, but I don't know whether this make 
sense, I just copy the code from TrainLogistic.
There is another question, because there are concurrent threads training the 
examples, will the scores option cause concurrent performance problems, because 
the main thread will read and convert csv records into Vectors, will it become 
a bottleneck ?
I copy the unfinished main function of my TrainAdaptiveLogistic class here for 
your reference:

public static void main(String[] args) throws IOException {
                if (parseArgs(args)) {
                        double logPEstimate = 0;
                        int k = 0;
                        
                        CsvRecordFactory csv = lmp.getCsvRecordFactory();
                        AdaptiveLogisticRegression lr = lmp
                                        .createAdaptiveLogisticRegression();
                        
                        for (int pass = 0; pass < passes; pass++) {
                                BufferedReader in = open(inputFile);

                                // read variable names
                                csv.firstLine(in.readLine());

                                String line = in.readLine();
                                
                                while (line != null) {
                                        // for each new line, get target and 
predictors
                                        Vector input = new 
RandomAccessSparseVector(lmp.getNumFeatures());
                                        int targetValue = csv.processLine(line, 
input);
                                        // update model
                                        lr.train(targetValue, input);
                                        k ++;
                                        
                                        if (scores && (k % (skipscorenum + 1) 
== 0) ) {
                                                
                                                State<Wrapper, 
CrossFoldLearner> best = lr.getBest();
                                                CrossFoldLearner learner = null;
                                                if (null != best) {
                                                        learner = 
best.getPayload().getLearner();
                                                }
                                                if (learner != null) {
                                                // check performance while this 
is still news
                                                double logP = 
learner.logLikelihood(targetValue, input);
                                                if (!Double.isInfinite(logP)) {
                                                        if (k < 20) {
                                                                logPEstimate = 
(k * logPEstimate + logP)
                                                                                
/ (k + 1);
                                                        } else {
                                                                logPEstimate = 
0.95 * logPEstimate + 0.05
                                                                                
* logP;
                                                        }                       
                                
                                                }
                                                double p = 
learner.classifyScalar(input);                                       
                                                output.printf(Locale.ENGLISH,
                                                                "%10d %2d 
%10.2f %2.4f %10.4f %10.4f\n",
                                                                k, targetValue,
                                                                
learner.percentCorrect(), p, logP,
                                                                logPEstimate);
                                                }else{
                                                        
output.printf(Locale.ENGLISH,
                                                                        "%10d 
%2d %s\n", k, targetValue,
                                                                        
"AdaptiveLogisticRegression is not ready for scoring ... ");
                                                }
                                        }
                                        

                                        line = in.readLine();
                                }
                                in.close();
                        }

                        
                        OutputStream modelOutput = new 
FileOutputStream(outputFile);
                        try {
                                lmp.saveTo(modelOutput);
                        } finally {
                                modelOutput.close();
                        }               
                        
                        output.printf(Locale.ENGLISH, "%d\n", 
lmp.getNumFeatures());
                        output.printf(Locale.ENGLISH, "%s ~ ", 
lmp.getTargetVariable());

                        String sep = "";
                }
        }




> Command line program for AdaptiveLogiscticRegression
> ----------------------------------------------------
>
>                 Key: MAHOUT-696
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-696
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.5
>            Reporter: XiaoboGu
>             Fix For: 0.6
>
>         Attachments: mahout-696-r1.patch, mahout-696-r2.patch
>
>
> Suggested by Ted, I'll try to write a command line program for 
> AdaptiveLogicticRegression, but as I am not familir with the algorithm, I'll 
> try to write a prototype for the program from a Java developer's perspactive, 
> hope anyone else will help with the details of the algorithm.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-696) Command line program for AdaptiveLogiscticRegression

Reply via email to