[jira] Commented: (LUCENE-1058) New Analyzer for buffering tokens

Grant Ingersoll (JIRA) Tue, 27 Nov 2007 14:58:13 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546040
 ]


Grant Ingersoll commented on LUCENE-1058:
-----------------------------------------

OK, I am trying not be fixated on the Analyzer.   I guess I haven't fully 
synthesized the new TokenStream use in DocsWriter

I agree, I don't like the no-value Field, and am open to suggestions.

So, I guess I am going to push back and ask, how would you solve the case of 
where you have two fields and the Analysis given by:
source field:
StandardTokenizer
Proper Noun TF
LowerCaseTF
StopTF

buffered1 Field:
Proper Noun Cache TF  (cache of all terms found to be proper nouns by the 
Proper Noun TF)

buffered2 Field:
All terms lower cased

And the requirement is that you only do the Analysis phase once (i.e. for the 
source field) and the other two fields are from memory.

I am just not seeing it yet, so I appreciate the explanation as it will better 
cement my understanding of the new Token Stream stuff and DocsWriter



> New Analyzer for buffering tokens
> ---------------------------------
>
>                 Key: LUCENE-1058
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1058
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-1058.patch, LUCENE-1058.patch, LUCENE-1058.patch, 
> LUCENE-1058.patch, LUCENE-1058.patch
>
>
> In some cases, it would be handy to have Analyzer/Tokenizer/TokenFilters that 
> could siphon off certain tokens and store them in a buffer to be used later 
> in the processing pipeline.
> For example, if you want to have two fields, one lowercased and one not, but 
> all the other analysis is the same, then you could save off the tokens to be 
> output for a different field.
> Patch to follow, but I am still not sure about a couple of things, mostly how 
> it plays with the new reuse API.
> See 
> http://www.gossamer-threads.com/lists/lucene/java-dev/54397?search_string=BufferingAnalyzer;#54397

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1058) New Analyzer for buffering tokens

Reply via email to