Thank you James, I don't count the token having pattern ".*[A-Za-z0-9]+.*" and check some cases it works. The token is not satisfied that pattern can be a punctuation. Is that pattern enough to cover a keyword? Can we incorporate Lucene and OpenNLP so that the keyword position and Named Entity position are compatible?
On Sun, Nov 6, 2011 at 12:22 AM, James Kosin <[email protected]> wrote: > Tri, > > You could just subtract the number of punctuation tokens from the > offsets you get. > On 11/5/2011 1:08 PM, Tri Nguyen wrote: > > On Sat, Nov 5, 2011 at 11:30 PM, Jörn Kottmann <[email protected]> > wrote: > > > >> On 11/5/11 4:53 PM, Tri Nguyen wrote: > >> > >>> Obama is correct, but Bill Gates. Since the NameFinderME return the > token > >>> index (position in the token array) not the keyword position (the > keyword > >>> position in the text). I want to cooperate with keyword position in > >>> Lucene. > >>> > >> What is a keyword position? > >> > > It is the order of the word in the text. > > Ex: > > Barack: 0 > > Obama: 1 > > president: 3 > > US: 5 > > he: 6 > > 1961: 11 > > Bill: 12 > > > >> Jörn > >> > >
