Hi Niki, > On Oct 18, 2016, at 7:27 AM, Niki Pavlopoulou <n...@exonar.com> wrote: > > Hi all, > > I am using Lucene and OpenNLP for POS tagging. I would like to support > biGrams with POS tags as well. For example, I would like something like > that: > > Input: (I[PRP], am[VBP], using[VBG], Lucene[NNP]) > Output: (I[PRP] am[VBP], am[VBP] using[VBG], using[VBG] Lucene[NNP]) > > The problem above is that I do not have "pure" tokens, like "I", "am" etc., > so the analysis could be wrong if I add the POS tags as an input in Lucene. > Is there a way to solve this, apart from creating my custome Lucene > analyser?
To create your bigrams, check out ShingleFilter: <http://lucene.apache.org/core/6_2_1/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html> I’m not sure what you mean by “the analysis could be wrong if I add the POS tags as an input in Lucene” - can you give an example? You may be interested in the work-in-progress addition of OpenNLP integration with Lucene here: <https://issues.apache.org/jira/browse/LUCENE-2899> -- Steve www.lucidworks.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org