Hi Jim, Thanks for replying. Could you be more specific please. These are the things that I am aware of: 1. The training data can be of the form <START:person> Pierre Vinken <END> is a good example . 2. Currently I use a file in the below format and create a 'Dictionary' from it. This is the format
<entry><token>vinayak</token></entry> > > <entry><token>rakesh</token></entry> > > <entry><token>sandeep</token></entry> > > <entry><token>manoj</token></entry> > > And use this dictionary in the DictionaryNameFinder. I would like to know the advantages of using this format. Is there any other formats available? Could you please explain more. Thanks. Manoj On Fri, Jul 21, 2017 at 3:56 PM, Jim O'Regan <jaore...@tcd.ie> wrote: > 2017-07-19 10:48 GMT+01:00 Manoj B. Narayanan < > manojb.narayanan2...@gmail.com>: > > > Hi all, > > > > I wanted to find out if there is any specific reason behind using XML > > format for dictionaries for Name Finder. > > > > It's not XML. There is a very superficial similarity in the use of <>, but, > at a minimum > <START:person> Pierre Vinken <END> > would need to be something like > <name type="person"> Pierre Vinken </name> > and the whole document would need to be enclosed by a pair of tags. > > > > Also, is there any source from where we can get the documentation > regarding > > the dictionary formats for various tools (tokenizer, pos, name finder). > > > > The manual: https://opennlp.apache.org/docs/1.8.1/manual/opennlp.html > More specifically, > tokeniser: > https://opennlp.apache.org/docs/1.8.1/manual/opennlp. > html#tools.tokenizer.training > pos: > https://opennlp.apache.org/docs/1.8.1/manual/opennlp. > html#tools.postagger.training > name finder: > https://opennlp.apache.org/docs/1.8.1/manual/opennlp. > html#tools.namefind.training >