Re: Analyzer: Access to document?

Ahmet Arslan Wed, 04 Feb 2015 07:17:40 -0800

Hi Ralf,

Does following code fragment work for you?


/**
* Modified from : 
http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/analysis/package-summary.html
*/
public List<String> getAnalyzedTokens(String text) throws IOException {

final List<String> list = new ArrayList<>();
try (TokenStream ts = analyzer().tokenStream("field", new StringReader(text))) {

final CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
ts.reset(); // Resets this stream to the beginning. (Required)
while (ts.incrementToken())
list.add(termAtt.toString());

ts.end();   // Perform end-of-stream operations, e.g. set the final offset.
}
return list;
}





On Wednesday, February 4, 2015 2:45 PM, Ralf Bierig <[email protected]> 
wrote:
Hi all,

an Analyzer has access to content on a per-field level by overwriting 
this method:

protected TokenStreamComponents createComponents(String fieldName, 
Reader reader);

Is it possible to get to the document? I want to collect the text 
content from the entire document within my analyzer to be processed by 
an external component.

Best,
Ralf

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Analyzer: Access to document?

Reply via email to