Michael McCandless wrote:

Maybe we should un-deprecate the termText() method but add javadocs explaining that for better performance you should use the char[] reuse methods instead?
I think so, too. Should we leave it as deprecated until 3.0? With the performance note and the encouragement to go for re-use, but also with a note that the current implementation is deprecated not the interface.

That's not quite what deprecated means. My thought on this is that it will give everyone a heads up that the current implementation is going away and that the replacement is sub-optimal.

(I use Eclipse and have it set to flag all deprecated uses. This helps me look for places to change.)

I think that this will make migration to 3.0 be much easier.

With this changing Term to add Term(String, Token) won't be necessary.

-- DM

Mike

DM Smith wrote:

Michael McCandless wrote:

DM Smith wrote:

Shouldn't Term have constructors that take a Token?

I think that makes sense, though normally Token appears during analysis and Term during searching (I think?) -- how often would you need to make a Term from a Token?

The problem I'm addressing is that tokens are used in contexts that need String and not char[].
The call to the deprecated
 String termText = token.termText();
needs to be replaced with:
String termText = new String(token.termBuffer(), 0, token.termLength());

There are over 170 calls to token.termText(), each of these places have to be modified. In some, perhaps many, of these cases it may be possible to use char[] directly to get a performance gain.

In the case of Term changing it to work with char[] buffer, int start, int length, does not seem quite right. I think the ripple would keep getting bigger. But logically, the Term's text is the text of a Token.

To me it makes sense to have a method that returns the token as a String, but that method is deprecated and the suggested replacement is to directly use the buffer. So this leads to the above construct. Perhaps it would be good to add a new method and document that as one of two replacements.
public String term() {
return termText != null ? termText : new String(token.termBuffer(), 0, token.termLength());
}

Here is an example from QueryParser that has 5 instances, each calling the deprecated t.termText() method. In this example, there is the construction of a query from a token stream.
Each of the problem lines are of the pattern:
 TermQuery currentQuery = new TermQuery(new Term(field, t.termText()));

To remove the deprecated call to t.termText(), the Token's buffer needs to be marshalled with something like: String termText = new String(token.termBuffer(), 0, token.termLength());
 TermQuery currentQuery = new TermQuery(new Term(field, termText)));

/**
 * @exception ParseException throw in overridden method to disallow
 */
protected Query getFieldQuery(String field, String queryText) throws ParseException {
  // Use the analyzer to get all the tokens, and then build a TermQuery,
  // PhraseQuery, or nothing based on the term count

TokenStream source = analyzer.tokenStream(field, new StringReader(queryText));
  Vector v = new Vector();
  org.apache.lucene.analysis.Token t;
  int positionCount = 0;
  boolean severalTokensAtSamePosition = false;

  while (true) {
    try {
      t = source.next();
    }
    catch (IOException e) {
      t = null;
    }
    if (t == null)
      break;
    v.addElement(t);
    if (t.getPositionIncrement() != 0)
      positionCount += t.getPositionIncrement();
    else
      severalTokensAtSamePosition = true;
  }
  try {
    source.close();
  }
  catch (IOException e) {
    // ignore
  }

  if (v.size() == 0)
    return null;
  else if (v.size() == 1) {
    t = (org.apache.lucene.analysis.Token) v.elementAt(0);
    return new TermQuery(new Term(field, t.termText()));
  } else {
    if (severalTokensAtSamePosition) {
      if (positionCount == 1) {
        // no phrase query:
        BooleanQuery q = new BooleanQuery(true);
        for (int i = 0; i < v.size(); i++) {
          t = (org.apache.lucene.analysis.Token) v.elementAt(i);
          TermQuery currentQuery = new TermQuery(
              new Term(field, t.termText()));
          q.add(currentQuery, BooleanClause.Occur.SHOULD);
        }
        return q;
      }
      else {
        // phrase query:
        MultiPhraseQuery mpq = new MultiPhraseQuery();
        mpq.setSlop(phraseSlop);
        List multiTerms = new ArrayList();
        int position = -1;
        for (int i = 0; i < v.size(); i++) {
          t = (org.apache.lucene.analysis.Token) v.elementAt(i);
          if (t.getPositionIncrement() > 0 && multiTerms.size() > 0) {
            if (enablePositionIncrements) {
              mpq.add((Term[])multiTerms.toArray(new Term[0]),position);
            } else {
              mpq.add((Term[])multiTerms.toArray(new Term[0]));
            }
            multiTerms.clear();
          }
          position += t.getPositionIncrement();
          multiTerms.add(new Term(field, t.termText()));
        }
        if (enablePositionIncrements) {
          mpq.add((Term[])multiTerms.toArray(new Term[0]),position);
        } else {
          mpq.add((Term[])multiTerms.toArray(new Term[0]));
        }
        return mpq;
      }
    }
    else {
      PhraseQuery pq = new PhraseQuery();
      pq.setSlop(phraseSlop);
      int position = -1;
      for (int i = 0; i < v.size(); i++) {
        t = (org.apache.lucene.analysis.Token) v.elementAt(i);
        if (enablePositionIncrements) {
          position += t.getPositionIncrement();
          pq.add(new Term(field, t.termText()),position);
        } else {
          pq.add(new Term(field, t.termText()));
        }
      }
      return pq;
    }
  }
}


Here is an example that works around the deprecated code:
public void testShingleAnalyzerWrapperPhraseQuery() throws Exception {
Analyzer analyzer = new ShingleAnalyzerWrapper(new WhitespaceAnalyzer(), 2);
  searcher = setUpSearcher(analyzer);

  PhraseQuery q = new PhraseQuery();

  TokenStream ts = analyzer.tokenStream("content",
new StringReader("this sentence"));
  Token token;
  int j = -1;
  while ((token = ts.next()) != null) {
    j += token.getPositionIncrement();
String termText = new String(token.termBuffer(), 0, token.termLength());
    q.add(new Term("content", termText), j);
  }

  Hits hits = searcher.search(q);
  int[] ranks = new int[] { 0 };
  compareRanks(hits, ranks);
}

-- DM

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to