This works and I can reuse token streams. But why TokenStream.reset() does not
work which was in my earlier case. Is this a marker method in TokenStream
without implementation and CachingTokenFilter implements the method.
- BR
Mark Miller <[EMAIL PROTECTED]> wrote:
reset is optional. StandardAnalyzer does not implement it. Check out
CachingTokenFilter and wrap StandardAnalzyer in it.
Cool Coder wrote:
> Currently I have extended StandardAnalyzer and counting tokens in the
> following way. But the index is not getting created , though I call
> tokenStream.reset(). I am not sure whether reset() on token stream works or
> not??? I am debugging now
>
> public TokenStream tokenStream(String fieldName, Reader reader) {
> TokenStream result = super.tokenStream(fieldName,new HTMLStripReader(reader));
> //To count tokens and put in a Map
> analyzeTokens(result);
> try {
> result.reset();
> } catch (IOException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> return result;
> }
>
> public void analyzeTokens(TokenStream result)
> {
> try {
> Token token = result.next();
> while(token != null)
> {
> String tokenStr = token.termText();
> if(TokenHolder.tokenMap.get(tokenStr) == null)
> {
> TokenHolder.tokenMap.put(tokenStr,1);
> }
> else
> {
> TokenHolder.tokenMap.put(tokenStr,Integer.parseInt(TokenHolder.tokenMap.get(tokenStr).toString())+1);
> }
> token = result.next();
>
> }
> //exxtra reset
> result.reset();
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
>
>
> Karl Wettin wrote:
>
> 1 nov 2007 kl. 18.09 skrev Cool Coder:
>
>
>> prior to adding into index
>>
>
> Easiest way out would be to add the document to a temporary index and
> extract the term frequency vector. I would recommend using MemoryIndex.
>
> You could also tokenize the document and pass the data to a
> TermVectorMapper. You could consider replacing the fields of the
> document with CachedTokenStreams if you got the RAM to spare and
> don't want to waste CPU analyzing the document twice. I welcome
> TermVectorMappingChachedTokenStreamFactory. Even cooler would be to
> pass code down the IndexWriter.addDocument using a command pattern or
> something, allowing one to extend the document at the time of the
> analysis.
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com