Re: Training perceptron model

Joern Kottmann Mon, 06 Mar 2017 04:32:19 -0800

You should understand how it works, have a look at this wikipedia article,
the picture on the right side explains it quite nicely.
https://en.wikipedia.org/wiki/Cross-validation_(statistics)


The idea is to split the data into n partitions and then use n-1 for
training and 1 for testing, this is repeated n times, so that each
partition was once used for testing.

It really should be three times as long in your case, maybe there is
something else wrong?'

Jörn

On Mon, Mar 6, 2017 at 12:36 PM, Damiano Porta <[email protected]>
wrote:

> Unfortunately not, 100 iterations ~ 30 minutes 300 iterations > 2 days and
> it is still running... i will block it
>
> i still do not understand what number should i set as *folds*. Ok i will
> set a number > 1 but, should i have to pay more attention to this
> parameter? if i set 8 or 10 does it matter anything?
>
>
>
> 2017-03-06 12:19 GMT+01:00 Joern Kottmann <[email protected]>:
>
> > test.evaluate(samples, 1), here the second parameter is the number of
> > folds, usually you use 10 or a number larger than 1.
> >
> > The amount of times you need for training with perceptron is linear to
> the
> > iterations, if you use 300 instead of 100 it should take three times as
> > long.
> >
> > Jörn
> >
> > On Mon, Mar 6, 2017 at 11:12 AM, Damiano Porta <[email protected]>
> > wrote:
> >
> > > Jorn,
> > > I am training and testing the model via api. If it is not a training
> > > problem. How is that possible that the evaluation is taking 2 days (and
> > > still running) to evaluate the model? As i told you with 100
> iterations i
> > > can get the model and the test in ~30 minutes.
> > >
> > > I only have a doubt about evaluation, this is the code:
> > >
> > >         try (ObjectStream<NameSample> samples =
> > > ObjectStreamUtils.createObjectStream(evaluation)) {
> > >
> > >             TrainingParameters mlParams = new TrainingParameters();
> > >             mlParams.put(TrainingParameters.ALGORITHM_PARAM,
> > > PerceptronTrainer.PERCEPTRON_VALUE);
> > >             mlParams.put(TrainingParameters.ITERATIONS_PARAM,
> > > Integer.toString(100));
> > >             mlParams.put(TrainingParameters.CUTOFF_PARAM,
> > > Integer.toString(0));
> > >
> > >             TokenNameFinderCrossValidator test = new
> > > TokenNameFinderCrossValidator("it",
> > >                 null, mlParams, null,
> > > (TokenNameFinderEvaluationMonitor)null);
> > >
> > >             test.evaluate(samples, 1); *// <---- SECOND PARAMETER HERE*
> > >
> > >             FMeasure result = test.getFMeasure();
> > >
> > >             System.out.println(result.toString());
> > >         }
> > >
> > > What should i put on the second parameter of test.evaluate() ? Each
> > sample
> > > (in samples variable) represents a document. There are no relations
> with
> > > other samples.
> > >
> > > 2017-03-06 10:56 GMT+01:00 Joern Kottmann <[email protected]>:
> > >
> > > > Hello,
> > > >
> > > > the model is only available after the training finished, hard to
> guess
> > > what
> > > > you are doing.
> > > >
> > > > Do you use the command line? Which command?
> > > >
> > > > Jörn
> > > >
> > > > On Mon, Mar 6, 2017 at 10:29 AM, Damiano Porta <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > > Hello Jorn,
> > > > > I tried with 300 iterations and it takes forever, reducing that
> > number
> > > to
> > > > > 100 i can finally get the model in half an hour.
> > > > >
> > > > > The problem with 300 iterations is that i can see the model (.bin)
> in
> > > > half
> > > > > an hour too but the computations are still running. So i do not
> > really
> > > > > understand what it is doing.
> > > > >
> > > > > Damiano
> > > > >
> > > > > 2017-03-06 10:19 GMT+01:00 Joern Kottmann <[email protected]>:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > this looks like output from the cross validator.
> > > > > >
> > > > > > Jörn
> > > > > >
> > > > > > On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta <
> > > [email protected]
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I am training a NER model with perceptron classifier (using
> > OpenNLP
> > > > > > 1.7.0)
> > > > > > >
> > > > > > > the output of the training is:
> > > > > > >
> > > > > > > Indexing events using cutoff of 0
> > > > > > >
> > > > > > > Computing event counts...  done. 11861603 events
> > > > > > > Indexing...  done.
> > > > > > > Collecting events... Done indexing.
> > > > > > > Incorporating indexed data for training...
> > > > > > > done.
> > > > > > > Number of Event Tokens: 11861603
> > > > > > >    Number of Outcomes: 23
> > > > > > >  Number of Predicates: 6623489
> > > > > > > Computing model parameters...
> > > > > > > Performing 300 iterations.
> > > > > > >   1:  . (11795234/11861603) 0.9944047191597966
> > > > > > >   2:  . (11820243/11861603) 0.9965131188423689
> > > > > > >   3:  . (11829329/11861603) 0.9972791198626357
> > > > > > >   4:  . (11834935/11861603) 0.9977517372651908
> > > > > > >   5:  . (11838996/11861603) 0.9980941024581584
> > > > > > >   6:  . (11841501/11861603) 0.9983052880795286
> > > > > > >   7:  . (11843704/11861603) 0.998491013398442
> > > > > > >   8:  . (11845304/11861603) 0.9986259024180796
> > > > > > >   9:  . (11846421/11861603) 0.9987200718149141
> > > > > > >  10:  . (11847181/11861603) 0.9987841440992419
> > > > > > >  20:  . (11852226/11861603) 0.9992094660392866
> > > > > > >  30:  . (11853947/11861603) 0.9993545560410343
> > > > > > >  40:  . (11854831/11861603) 0.999429082224384
> > > > > > >  50:  . (11855471/11861603) 0.999483037832239
> > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > > Stats: (11846242/11861603) 0.998704981105842
> > > > > > > ...done.
> > > > > > > Compressed 6623489 parameters to 554312
> > > > > > > 6892 outcome patterns
> > > > > > > Indexing events using cutoff of 0
> > > > > > >
> > > > > > > Computing event counts...  done. 6370206 events
> > > > > > > Indexing...  done.
> > > > > > > Collecting events... Done indexing.
> > > > > > > Incorporating indexed data for training...
> > > > > > > done.
> > > > > > > Number of Event Tokens: 6370206
> > > > > > >    Number of Outcomes: 23
> > > > > > >  Number of Predicates: 3737425
> > > > > > > Computing model parameters...
> > > > > > > Performing 300 iterations.
> > > > > > >   1:  . (6330365/6370206) 0.9937457281601254
> > > > > > >   2:  . (6345859/6370206) 0.9961779885925196
> > > > > > >   3:  . (6351552/6370206) 0.9970716802564941
> > > > > > >   4:  . (6354847/6370206) 0.9975889319748843
> > > > > > >   5:  . (6356872/6370206) 0.997906818084062
> > > > > > >   6:  . (6358350/6370206) 0.998138835698563
> > > > > > >   7:  . (6359611/6370206) 0.9983367884806237
> > > > > > >   8:  . (6360473/6370206) 0.9984721059256169
> > > > > > >   9:  . (6361138/6370206) 0.9985764981540628
> > > > > > >  10:  . (6361532/6370206) 0.9986383485871572
> > > > > > >  20:  . (6364161/6370206) 0.9990510510963068
> > > > > > >  30:  . (6365106/6370206) 0.9991993979472563
> > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > > Stats: (6360617/6370206) 0.9984947111600473
> > > > > > > ...done.
> > > > > > > Indexing events using cutoff of 0
> > > > > > >
> > > > > > > Computing event counts...  done. 6370114 events
> > > > > > > Indexing...  done.
> > > > > > > Collecting events... Done indexing.
> > > > > > > Incorporating indexed data for training...
> > > > > > > done.
> > > > > > > Number of Event Tokens: 6370114
> > > > > > >    Number of Outcomes: 23
> > > > > > >  Number of Predicates: 3737390
> > > > > > > Computing model parameters...
> > > > > > > Performing 300 iterations.
> > > > > > >   1:  . (6330266/6370114) 0.9937445389517362
> > > > > > >   2:  . (6345810/6370114) 0.9961846836650019
> > > > > > >   3:  . (6351374/6370114) 0.9970581374210885
> > > > > > >   4:  . (6354747/6370114) 0.9975876412886803
> > > > > > >   5:  . (6356872/6370114) 0.9979212302950936
> > > > > > >   6:  . (6358429/6370114) 0.998165652922381
> > > > > > >   7:  . (6359417/6370114) 0.9983207521874805
> > > > > > >   8:  . (6360292/6370114) 0.9984581123665919
> > > > > > >   9:  . (6361076/6370114) 0.9985811870870757
> > > > > > >  10:  . (6361693/6370114) 0.998678045636232
> > > > > > >  20:  . (6364109/6370114) 0.9990573167136413
> > > > > > >  30:  . (6365008/6370114) 0.9991984444862368
> > > > > > >  40:  . (6365478/6370114) 0.9992722265253023
> > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > > Stats: (6359985/6370114) 0.9984099185666065
> > > > > > > ...done.
> > > > > > > Indexing events using cutoff of 0
> > > > > > >
> > > > > > > Computing event counts...  done. 6370480 events
> > > > > > > Indexing...  done.
> > > > > > > Collecting events... Done indexing.
> > > > > > > Incorporating indexed data for training...
> > > > > > > done.
> > > > > > > Number of Event Tokens: 6370480
> > > > > > >    Number of Outcomes: 23
> > > > > > >  Number of Predicates: 3737798
> > > > > > > Computing model parameters...
> > > > > > > Performing 300 iterations.
> > > > > > >   1:  . (6330685/6370480) 0.9937532179678769
> > > > > > >   2:  . (6346153/6370480) 0.9961812924614786
> > > > > > >   3:  . (6351726/6370480) 0.9970561088018485
> > > > > > >   4:  . (6355089/6370480) 0.9975840125076917
> > > > > > >   5:  . (6357173/6370480) 0.9979111464128292
> > > > > > >   6:  . (6358780/6370480) 0.9981634036995642
> > > > > > >   7:  . (6359845/6370480) 0.9983305810551167
> > > > > > >   8:  . (6360827/6370480) 0.9984847295651191
> > > > > > >   9:  . (6361316/6370480) 0.9985614898720347
> > > > > > >  10:  . (6362076/6370480) 0.9986807901445417
> > > > > > >  20:  . (6364506/6370480) 0.9990622370684784
> > > > > > >  30:  . (6365415/6370480) 0.9992049264733583
> > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > > Stats: (6362594/6370480) 0.9987621026986977
> > > > > > > ...done.
> > > > > > > Indexing events using cutoff of 0
> > > > > > >
> > > > > > > Computing event counts...  done. 6370008 events
> > > > > > > Indexing...  done.
> > > > > > > Collecting events... Done indexing.
> > > > > > > Incorporating indexed data for training...
> > > > > > > done.
> > > > > > > Number of Event Tokens: 6370008
> > > > > > >    Number of Outcomes: 23
> > > > > > >  Number of Predicates: 3737824
> > > > > > > Computing model parameters...
> > > > > > > Performing 300 iterations.
> > > > > > >   1:  . (6330200/6370008) 0.9937507142848172
> > > > > > >   2:  . (6345643/6370008) 0.9961750440501802
> > > > > > >   3:  . (6351415/6370008) 0.9970811653611737
> > > > > > >   4:  . (6354522/6370008) 0.9975689198506501
> > > > > > >   5:  . (6356723/6370008) 0.9979144453193779
> > > > > > >   6:  . (6358164/6370008) 0.9981406616757781
> > > > > > >   7:  . (6359399/6370008) 0.9983345389833106
> > > > > > >   8:  . (6360274/6370008) 0.9984719014481614
> > > > > > >   9:  . (6360694/6370008) 0.9985378354312899
> > > > > > >  10:  . (6361531/6370008) 0.9986692324405244
> > > > > > > ....
> > > > > > > ....
> > > > > > > ....
> > > > > > >
> > > > > > > etc etc is that normal ? The parameters are; *0 cutoff* and
> *300
> > > > > > > iterators*.
> > > > > > >
> > > > > > > The corpus is relative small, it has 20k sentences.
> > > > > > >
> > > > > > > I do not remember an output like that using MAXENT classifier.
> > > > > > >
> > > > > > > Damiano
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Training perceptron model

Reply via email to