On 04/29/2013 02:32 PM, Svetoslav Marinov wrote:
Ok, I hope I do this correctly: The counter for sample object I take from
sampleStream: ObjectStream<NameSample> sampleStream = new
NameSampleDataStream(lineStream);
I use sampleStream.read() and the get 468 samples less than the number of
sentences (which are 2 611 247). Shouldn't sampleStream match the number
of sentences? I have samples without entities, but I suspect they are more
than 468. Will check though.
Otherwise I am not sure where to measure how many are processed per
second. Do you mean during the creation of the NEmodel? Or? How does one
do that?
You could implement a proxy ObjectStream object which can be inserted
into the stream,
the call to the read method can then be used to do the counting and
maybe printing out
the progress every n calls.
The difference could come from empty lines in your training data, only
non-empty lines are becoming sample objects.
Jörn