Storing Part of Speech information in Lucene Indices

Amit Kumar Tue, 11 Jul 2006 22:36:48 -0700

Hi,

A new project that I am investigating lucene for needs the Parts ofspeech information for the tokens. I can get thatinformation using NLP techniques (GATE etc.), by pre processing thedocuments but I would like to store that

information in the Indices. Something along the lines of


TermVectorOffsetInfo[?].getPartofSpeech();

I am writing to ask for your advice, you can tell me I am b o n k e rs or let me know where I should start digging :).Is that a good idea? Or would it be just less trouble for me to storethe offset information along with parts of speech

outside Lucene.

Has anyone else done that?

Best,
Amit

ps: Thank you for putting the LuceneInAction source online, it was agreat help to see the CategorizerTest.java.

I am ordering my copy of the book tomorrow :)

---------------------------------------------------------
Amit Kumar
Research Programmer
The Graduate School of Library and Information Science
University of Illinois, Urbana Champaign IL, 61820
phone: 217-333-4118 fax: 217-244-3302
---------------------------------------------------------

Storing Part of Speech information in Lucene Indices

Reply via email to