As you have seen the example code for PartOfSpeechTaggingFilter at http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/analysis/package-summary.html
You can use a custom analyzer to inject "metadata" tokens into the index at the same position as the source tokens. For example, given the text: The cat jumped over the dog your analyzer could emit tokens: [the] [cat,_posNoun] [jumped,_posVerb] [over] [the] [dog,_posNoun] where the "_pos...." tokens have a zero position increment to effectively associate them with the term to which they relate (this is how the example SynonymTokenizer in the highlighter package works). The "_pos" prefix is used as a uniquefier for metadata tokens to avoid any name-clashes with any real content tokens. Theoretically you could then construct queries where the queries mixed both data and your part-of-speech metadata eg you could use the position information based queries to find out what things normally have a particular verb applied to them: "jumped _posNoun"~3 or what verbs are commonly associated with a dog (caution advised here): "_posVerb the dog"~3 or to use an ambiguous word in a particular context/sense "_posVerb track"~1 ----- Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Inquiring-part-of-speech-POS-tagging-indexing-and-searching-tp2886079p2892981.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org