Hi Ralf, Does following code fragment work for you?
/** * Modified from : http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/analysis/package-summary.html */ public List<String> getAnalyzedTokens(String text) throws IOException { final List<String> list = new ArrayList<>(); try (TokenStream ts = analyzer().tokenStream("field", new StringReader(text))) { final CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class); ts.reset(); // Resets this stream to the beginning. (Required) while (ts.incrementToken()) list.add(termAtt.toString()); ts.end(); // Perform end-of-stream operations, e.g. set the final offset. } return list; } On Wednesday, February 4, 2015 2:45 PM, Ralf Bierig <ralf.bie...@gmail.com> wrote: Hi all, an Analyzer has access to content on a per-field level by overwriting this method: protected TokenStreamComponents createComponents(String fieldName, Reader reader); Is it possible to get to the document? I want to collect the text content from the entire document within my analyzer to be processed by an external component. Best, Ralf --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org