Hello all,
I am putting this question on its own thread not to get lost.
Question is about the proper usage of DefaultModelBuilderUtil.
I have not figured out the proper format of the files. Here' s what I think
from what I have been reading. Tell me if I am write.
>From class DefaultModelBuilderUtil method generateModel
@param sentences a file that contains one sentence per line.
* There should be at least 15K sentences
* consisting of a representative sample
from
* user data
This seems to be a text file where each sentence is on one line.
I wonder if it has to be annotated, for instance:
<START:person> Archimedes <END> used the method of exhaustion to
approximate the value of π.Archimedes ( 287&ndash ;212 BC ) was the first
to estimate π rigorously .
Or just:
Archimedes used the method of exhaustion to approximate the value of
π.Archimedes ( 287&ndash ;212 BC ) was the first to estimate π rigorously .
@param knownEntities a file consisting of a simple list of
* unambiguous entities, one entry per
line.
* For instance, if one was trying to
build a
* person NER model then this file would
be a
* list of person names that are
unambiguous
* and are known to exist in the sentences
This would be a text file list?
Something like one name per line?
Archimedes
Socrates
....
* @param knownEntitiesBlacklist This file contains a list of known bad
hits
* that the NER phase of this processing
might
* catch early one before the model
iterates
* to maturity
Same as the knownEntities but a list of what NOT to mark as an entity?
The rest seemed quite straight forward.
Thanks,