[jira] Commented: (LUCENE-1058) New Analyzer for buffering tokens

Yonik Seeley (JIRA) Tue, 27 Nov 2007 16:10:14 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546052
 ]


Yonik Seeley commented on LUCENE-1058:
--------------------------------------

Very similar to what I came up with I think... (all untested, etc)

{code}
class ListTokenizer extends Tokenizer {
  protected List<Token> lst = new ArrayList<Token>();
  protected Iterator<Token> iter;

  public ListTokenizer(List<Token> input) {
    this.lst = input;
    if (this.lst==null) this.lst = new ArrayList<Token>();
  }

  /** only valid if tokens have not been consumed,
   * i.e. if this tokenizer is not part of another tokenstream
   */
  public List<Token> getTokens() {
    return lst;
  }

  public Token next(Token result) throws IOException {
    if (iter==null) iter = lst.iterator();
    return iter.next();
  }

  /** Override this method to cache only certain tokens, or new tokens based
   * on the old tokens.
   */
  public void add(Token t) {
    if (t==null) return;
    lst.add((Token)t.clone());
  }

  public void reset() throws IOException {
    iter = lst.iterator();
  }
}

class TeeTokenFilter extends TokenFilter {
  ListTokenizer sink;

  protected TeeTokenFilter(TokenStream input, ListTokenizer sink) {
    super(input);
    this.sink = sink;
  }

  public Token next(Token result) throws IOException {
    Token t = input.next(result);
    sink.add(t);
    return t;
  }
}
{code}

> New Analyzer for buffering tokens
> ---------------------------------
>
>                 Key: LUCENE-1058
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1058
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-1058.patch, LUCENE-1058.patch, LUCENE-1058.patch, 
> LUCENE-1058.patch, LUCENE-1058.patch
>
>
> In some cases, it would be handy to have Analyzer/Tokenizer/TokenFilters that 
> could siphon off certain tokens and store them in a buffer to be used later 
> in the processing pipeline.
> For example, if you want to have two fields, one lowercased and one not, but 
> all the other analysis is the same, then you could save off the tokens to be 
> output for a different field.
> Patch to follow, but I am still not sure about a couple of things, mostly how 
> it plays with the new reuse API.
> See 
> http://www.gossamer-threads.com/lists/lucene/java-dev/54397?search_string=BufferingAnalyzer;#54397

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1058) New Analyzer for buffering tokens

Reply via email to