I like the second approach

Span[] find(String text, Span sentences[], Span tokens[])

looks like it would be easier to use. Maybe we could add a new tokenize
method in Tokenizer which takes the sentence offset and outputs spans with
this offset included.

I could not understand what do you mean with using token offsets fot the
sentences.


On Thu, May 30, 2013 at 12:46 PM, Jörn Kottmann <[email protected]> wrote:

> We are now one iteration further. In this new version it is
> possible to pass in a document at once. Which leads
> to the question on how we should handle this in OpenNLP generally.
>
> To pass in a document the following information needs to be handed over:
> - Sentences
> - Tokens
> - Names
>
> And maybe a the text depending on if the tokens are Spans or Strings.
>
> If the component is stateless all this needs to handed over in one method
> call,
> otherwise it could handed over on a per sentences basis (thats how coref
> is doing it).
>
> The DocumentNameFinder (never implemented, but interface is defined) its
> done
> like this:
> Span[][] find(String tokens[][])
>
> In my opinion thats not a nice solution, it first requires that the input
> text
> gets split into Strings and second its hard to use the returned Spans,
> they are only meaningful
> within the context which is given by the returned array. Names which cross
> sentences are not possible.
>
> Another approach could be that:
> Span[] find(String text, Span sentences[], Span tokens[])
>
> Where the sentence and token offsets in the spans are character offsets,
> and
> the returned spans or token offsets.
>
> It would probably be nicer to use token offsets for the sentences as well,
> but thats
> currently incompatible with the sentence detector interface.
>
> Any opinions on how we should solve this?
>
> Jörn
>
>
> On 05/23/2013 03:04 PM, Jörn Kottmann wrote:
>
>> Hi all,
>>
>> please have a look at
>> https://issues.apache.org/**jira/browse/OPENNLP-579<https://issues.apache.org/jira/browse/OPENNLP-579>
>>
>> Its about a contribution to link location entities to a geo name database,
>> the component could later be extended to link other entity types as well
>> to
>> a database or dictionary.
>>
>> Thanks,
>> Jörn
>>
>
>

Reply via email to