[
https://issues.apache.org/jira/browse/OPENNLP-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030394#comment-14030394
]
Joern Kottmann commented on OPENNLP-701:
----------------------------------------
Sounds good. To train an OpenNLP component you have to provide an ObjectStream
outputting the corresponding XyzSample (e.g. NameSample for the Name Finder)
object.
The streams which can parse a certain format are in the opennlp.tools.formats
package. You will see the existing implementations there. I suggest that you
have a look e.g. at the Conll02NameSampleStream class. Other implementations
are usually very similar, so it doesn't really matter a which you look.
To integrate the format into the command line interface you have to implement
the Stream Factory, an example is Conll02NameSampleStreamFactory.
Hope that helps!
> Polish language support - Maxent binaries
> -----------------------------------------
>
> Key: OPENNLP-701
> URL: https://issues.apache.org/jira/browse/OPENNLP-701
> Project: OpenNLP
> Issue Type: New Feature
> Reporter: Chris Krol / IBM
> Priority: Minor
>
> Hi,
> Currently I'm working at IBM Poland and my manager approved the idea of
> contributing various Maxent binaries for Polish language (sentence split,
> sentence detection, POS tagging and morphological analysis, NER).
> You could possibly put them on your download page.
> We trained them using the Golden Standard human-annotated Polish National
> Corpus (GPL 3.0).
> Would this be also possible to give some credit (or any) to the fact that the
> job's been done at IBM?
> I've already sent a mail to the devs, but haven't seen any response for two
> weeks now.
--
This message was sent by Atlassian JIRA
(v6.2#6252)