Unfortunately not, 100 iterations ~ 30 minutes 300 iterations > 2 days and it is still running... i will block it
i still do not understand what number should i set as *folds*. Ok i will set a number > 1 but, should i have to pay more attention to this parameter? if i set 8 or 10 does it matter anything? 2017-03-06 12:19 GMT+01:00 Joern Kottmann <kottm...@gmail.com>: > test.evaluate(samples, 1), here the second parameter is the number of > folds, usually you use 10 or a number larger than 1. > > The amount of times you need for training with perceptron is linear to the > iterations, if you use 300 instead of 100 it should take three times as > long. > > Jörn > > On Mon, Mar 6, 2017 at 11:12 AM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > Jorn, > > I am training and testing the model via api. If it is not a training > > problem. How is that possible that the evaluation is taking 2 days (and > > still running) to evaluate the model? As i told you with 100 iterations i > > can get the model and the test in ~30 minutes. > > > > I only have a doubt about evaluation, this is the code: > > > > try (ObjectStream<NameSample> samples = > > ObjectStreamUtils.createObjectStream(evaluation)) { > > > > TrainingParameters mlParams = new TrainingParameters(); > > mlParams.put(TrainingParameters.ALGORITHM_PARAM, > > PerceptronTrainer.PERCEPTRON_VALUE); > > mlParams.put(TrainingParameters.ITERATIONS_PARAM, > > Integer.toString(100)); > > mlParams.put(TrainingParameters.CUTOFF_PARAM, > > Integer.toString(0)); > > > > TokenNameFinderCrossValidator test = new > > TokenNameFinderCrossValidator("it", > > null, mlParams, null, > > (TokenNameFinderEvaluationMonitor)null); > > > > test.evaluate(samples, 1); *// <---- SECOND PARAMETER HERE* > > > > FMeasure result = test.getFMeasure(); > > > > System.out.println(result.toString()); > > } > > > > What should i put on the second parameter of test.evaluate() ? Each > sample > > (in samples variable) represents a document. There are no relations with > > other samples. > > > > 2017-03-06 10:56 GMT+01:00 Joern Kottmann <kottm...@gmail.com>: > > > > > Hello, > > > > > > the model is only available after the training finished, hard to guess > > what > > > you are doing. > > > > > > Do you use the command line? Which command? > > > > > > Jörn > > > > > > On Mon, Mar 6, 2017 at 10:29 AM, Damiano Porta <damianopo...@gmail.com > > > > > wrote: > > > > > > > Hello Jorn, > > > > I tried with 300 iterations and it takes forever, reducing that > number > > to > > > > 100 i can finally get the model in half an hour. > > > > > > > > The problem with 300 iterations is that i can see the model (.bin) in > > > half > > > > an hour too but the computations are still running. So i do not > really > > > > understand what it is doing. > > > > > > > > Damiano > > > > > > > > 2017-03-06 10:19 GMT+01:00 Joern Kottmann <kottm...@gmail.com>: > > > > > > > > > Hello, > > > > > > > > > > this looks like output from the cross validator. > > > > > > > > > > Jörn > > > > > > > > > > On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta < > > damianopo...@gmail.com > > > > > > > > > wrote: > > > > > > > > > > > Hello, > > > > > > > > > > > > I am training a NER model with perceptron classifier (using > OpenNLP > > > > > 1.7.0) > > > > > > > > > > > > the output of the training is: > > > > > > > > > > > > Indexing events using cutoff of 0 > > > > > > > > > > > > Computing event counts... done. 11861603 events > > > > > > Indexing... done. > > > > > > Collecting events... Done indexing. > > > > > > Incorporating indexed data for training... > > > > > > done. > > > > > > Number of Event Tokens: 11861603 > > > > > > Number of Outcomes: 23 > > > > > > Number of Predicates: 6623489 > > > > > > Computing model parameters... > > > > > > Performing 300 iterations. > > > > > > 1: . (11795234/11861603) 0.9944047191597966 > > > > > > 2: . (11820243/11861603) 0.9965131188423689 > > > > > > 3: . (11829329/11861603) 0.9972791198626357 > > > > > > 4: . (11834935/11861603) 0.9977517372651908 > > > > > > 5: . (11838996/11861603) 0.9980941024581584 > > > > > > 6: . (11841501/11861603) 0.9983052880795286 > > > > > > 7: . (11843704/11861603) 0.998491013398442 > > > > > > 8: . (11845304/11861603) 0.9986259024180796 > > > > > > 9: . (11846421/11861603) 0.9987200718149141 > > > > > > 10: . (11847181/11861603) 0.9987841440992419 > > > > > > 20: . (11852226/11861603) 0.9992094660392866 > > > > > > 30: . (11853947/11861603) 0.9993545560410343 > > > > > > 40: . (11854831/11861603) 0.999429082224384 > > > > > > 50: . (11855471/11861603) 0.999483037832239 > > > > > > Stopping: change in training set accuracy less than 1.0E-5 > > > > > > Stats: (11846242/11861603) 0.998704981105842 > > > > > > ...done. > > > > > > Compressed 6623489 parameters to 554312 > > > > > > 6892 outcome patterns > > > > > > Indexing events using cutoff of 0 > > > > > > > > > > > > Computing event counts... done. 6370206 events > > > > > > Indexing... done. > > > > > > Collecting events... Done indexing. > > > > > > Incorporating indexed data for training... > > > > > > done. > > > > > > Number of Event Tokens: 6370206 > > > > > > Number of Outcomes: 23 > > > > > > Number of Predicates: 3737425 > > > > > > Computing model parameters... > > > > > > Performing 300 iterations. > > > > > > 1: . (6330365/6370206) 0.9937457281601254 > > > > > > 2: . (6345859/6370206) 0.9961779885925196 > > > > > > 3: . (6351552/6370206) 0.9970716802564941 > > > > > > 4: . (6354847/6370206) 0.9975889319748843 > > > > > > 5: . (6356872/6370206) 0.997906818084062 > > > > > > 6: . (6358350/6370206) 0.998138835698563 > > > > > > 7: . (6359611/6370206) 0.9983367884806237 > > > > > > 8: . (6360473/6370206) 0.9984721059256169 > > > > > > 9: . (6361138/6370206) 0.9985764981540628 > > > > > > 10: . (6361532/6370206) 0.9986383485871572 > > > > > > 20: . (6364161/6370206) 0.9990510510963068 > > > > > > 30: . (6365106/6370206) 0.9991993979472563 > > > > > > Stopping: change in training set accuracy less than 1.0E-5 > > > > > > Stats: (6360617/6370206) 0.9984947111600473 > > > > > > ...done. > > > > > > Indexing events using cutoff of 0 > > > > > > > > > > > > Computing event counts... done. 6370114 events > > > > > > Indexing... done. > > > > > > Collecting events... Done indexing. > > > > > > Incorporating indexed data for training... > > > > > > done. > > > > > > Number of Event Tokens: 6370114 > > > > > > Number of Outcomes: 23 > > > > > > Number of Predicates: 3737390 > > > > > > Computing model parameters... > > > > > > Performing 300 iterations. > > > > > > 1: . (6330266/6370114) 0.9937445389517362 > > > > > > 2: . (6345810/6370114) 0.9961846836650019 > > > > > > 3: . (6351374/6370114) 0.9970581374210885 > > > > > > 4: . (6354747/6370114) 0.9975876412886803 > > > > > > 5: . (6356872/6370114) 0.9979212302950936 > > > > > > 6: . (6358429/6370114) 0.998165652922381 > > > > > > 7: . (6359417/6370114) 0.9983207521874805 > > > > > > 8: . (6360292/6370114) 0.9984581123665919 > > > > > > 9: . (6361076/6370114) 0.9985811870870757 > > > > > > 10: . (6361693/6370114) 0.998678045636232 > > > > > > 20: . (6364109/6370114) 0.9990573167136413 > > > > > > 30: . (6365008/6370114) 0.9991984444862368 > > > > > > 40: . (6365478/6370114) 0.9992722265253023 > > > > > > Stopping: change in training set accuracy less than 1.0E-5 > > > > > > Stats: (6359985/6370114) 0.9984099185666065 > > > > > > ...done. > > > > > > Indexing events using cutoff of 0 > > > > > > > > > > > > Computing event counts... done. 6370480 events > > > > > > Indexing... done. > > > > > > Collecting events... Done indexing. > > > > > > Incorporating indexed data for training... > > > > > > done. > > > > > > Number of Event Tokens: 6370480 > > > > > > Number of Outcomes: 23 > > > > > > Number of Predicates: 3737798 > > > > > > Computing model parameters... > > > > > > Performing 300 iterations. > > > > > > 1: . (6330685/6370480) 0.9937532179678769 > > > > > > 2: . (6346153/6370480) 0.9961812924614786 > > > > > > 3: . (6351726/6370480) 0.9970561088018485 > > > > > > 4: . (6355089/6370480) 0.9975840125076917 > > > > > > 5: . (6357173/6370480) 0.9979111464128292 > > > > > > 6: . (6358780/6370480) 0.9981634036995642 > > > > > > 7: . (6359845/6370480) 0.9983305810551167 > > > > > > 8: . (6360827/6370480) 0.9984847295651191 > > > > > > 9: . (6361316/6370480) 0.9985614898720347 > > > > > > 10: . (6362076/6370480) 0.9986807901445417 > > > > > > 20: . (6364506/6370480) 0.9990622370684784 > > > > > > 30: . (6365415/6370480) 0.9992049264733583 > > > > > > Stopping: change in training set accuracy less than 1.0E-5 > > > > > > Stats: (6362594/6370480) 0.9987621026986977 > > > > > > ...done. > > > > > > Indexing events using cutoff of 0 > > > > > > > > > > > > Computing event counts... done. 6370008 events > > > > > > Indexing... done. > > > > > > Collecting events... Done indexing. > > > > > > Incorporating indexed data for training... > > > > > > done. > > > > > > Number of Event Tokens: 6370008 > > > > > > Number of Outcomes: 23 > > > > > > Number of Predicates: 3737824 > > > > > > Computing model parameters... > > > > > > Performing 300 iterations. > > > > > > 1: . (6330200/6370008) 0.9937507142848172 > > > > > > 2: . (6345643/6370008) 0.9961750440501802 > > > > > > 3: . (6351415/6370008) 0.9970811653611737 > > > > > > 4: . (6354522/6370008) 0.9975689198506501 > > > > > > 5: . (6356723/6370008) 0.9979144453193779 > > > > > > 6: . (6358164/6370008) 0.9981406616757781 > > > > > > 7: . (6359399/6370008) 0.9983345389833106 > > > > > > 8: . (6360274/6370008) 0.9984719014481614 > > > > > > 9: . (6360694/6370008) 0.9985378354312899 > > > > > > 10: . (6361531/6370008) 0.9986692324405244 > > > > > > .... > > > > > > .... > > > > > > .... > > > > > > > > > > > > etc etc is that normal ? The parameters are; *0 cutoff* and *300 > > > > > > iterators*. > > > > > > > > > > > > The corpus is relative small, it has 20k sentences. > > > > > > > > > > > > I do not remember an output like that using MAXENT classifier. > > > > > > > > > > > > Damiano > > > > > > > > > > > > > > > > > > > > >