Michael McCandless wrote:

DM Smith wrote:

 Shouldn't Term have constructors that take a Token?

I think that makes sense, though normally Token appears during analysis and Term during searching (I think?) -- how often would you need to make a Term from a Token?

The problem I'm addressing is that tokens are used in contexts that need String and not char[].
The call to the deprecated
  String termText = token.termText();
needs to be replaced with:
  String termText = new String(token.termBuffer(), 0, token.termLength());

There are over 170 calls to token.termText(), each of these places have to be modified. In some, perhaps many, of these cases it may be possible to use char[] directly to get a performance gain.

In the case of Term changing it to work with char[] buffer, int start, int length, does not seem quite right. I think the ripple would keep getting bigger. But logically, the Term's text is the text of a Token.

To me it makes sense to have a method that returns the token as a String, but that method is deprecated and the suggested replacement is to directly use the buffer. So this leads to the above construct. Perhaps it would be good to add a new method and document that as one of two replacements.
public String term() {
return termText != null ? termText : new String(token.termBuffer(), 0, token.termLength());
}

Here is an example from QueryParser that has 5 instances, each calling the deprecated t.termText() method. In this example, there is the construction of a query from a token stream.
Each of the problem lines are of the pattern:
  TermQuery currentQuery = new TermQuery(new Term(field, t.termText()));

To remove the deprecated call to t.termText(), the Token's buffer needs to be marshalled with something like:
  String termText = new String(token.termBuffer(), 0, token.termLength());
  TermQuery currentQuery = new TermQuery(new Term(field, termText)));

 /**
  * @exception ParseException throw in overridden method to disallow
  */
protected Query getFieldQuery(String field, String queryText) throws ParseException {
   // Use the analyzer to get all the tokens, and then build a TermQuery,
   // PhraseQuery, or nothing based on the term count

TokenStream source = analyzer.tokenStream(field, new StringReader(queryText));
   Vector v = new Vector();
   org.apache.lucene.analysis.Token t;
   int positionCount = 0;
   boolean severalTokensAtSamePosition = false;

   while (true) {
     try {
       t = source.next();
     }
     catch (IOException e) {
       t = null;
     }
     if (t == null)
       break;
     v.addElement(t);
     if (t.getPositionIncrement() != 0)
       positionCount += t.getPositionIncrement();
     else
       severalTokensAtSamePosition = true;
   }
   try {
     source.close();
   }
   catch (IOException e) {
     // ignore
   }

   if (v.size() == 0)
     return null;
   else if (v.size() == 1) {
     t = (org.apache.lucene.analysis.Token) v.elementAt(0);
     return new TermQuery(new Term(field, t.termText()));
   } else {
     if (severalTokensAtSamePosition) {
       if (positionCount == 1) {
         // no phrase query:
         BooleanQuery q = new BooleanQuery(true);
         for (int i = 0; i < v.size(); i++) {
           t = (org.apache.lucene.analysis.Token) v.elementAt(i);
           TermQuery currentQuery = new TermQuery(
               new Term(field, t.termText()));
           q.add(currentQuery, BooleanClause.Occur.SHOULD);
         }
         return q;
       }
       else {
         // phrase query:
         MultiPhraseQuery mpq = new MultiPhraseQuery();
         mpq.setSlop(phraseSlop);
         List multiTerms = new ArrayList();
         int position = -1;
         for (int i = 0; i < v.size(); i++) {
           t = (org.apache.lucene.analysis.Token) v.elementAt(i);
           if (t.getPositionIncrement() > 0 && multiTerms.size() > 0) {
             if (enablePositionIncrements) {
               mpq.add((Term[])multiTerms.toArray(new Term[0]),position);
             } else {
               mpq.add((Term[])multiTerms.toArray(new Term[0]));
             }
             multiTerms.clear();
           }
           position += t.getPositionIncrement();
           multiTerms.add(new Term(field, t.termText()));
         }
         if (enablePositionIncrements) {
           mpq.add((Term[])multiTerms.toArray(new Term[0]),position);
         } else {
           mpq.add((Term[])multiTerms.toArray(new Term[0]));
         }
         return mpq;
       }
     }
     else {
       PhraseQuery pq = new PhraseQuery();
       pq.setSlop(phraseSlop);
       int position = -1;
       for (int i = 0; i < v.size(); i++) {
         t = (org.apache.lucene.analysis.Token) v.elementAt(i);
         if (enablePositionIncrements) {
           position += t.getPositionIncrement();
           pq.add(new Term(field, t.termText()),position);
         } else {
           pq.add(new Term(field, t.termText()));
         }
       }
       return pq;
     }
   }
 }


Here is an example that works around the deprecated code:
 public void testShingleAnalyzerWrapperPhraseQuery() throws Exception {
Analyzer analyzer = new ShingleAnalyzerWrapper(new WhitespaceAnalyzer(), 2);
   searcher = setUpSearcher(analyzer);

   PhraseQuery q = new PhraseQuery();

   TokenStream ts = analyzer.tokenStream("content",
new StringReader("this sentence"));
   Token token;
   int j = -1;
   while ((token = ts.next()) != null) {
     j += token.getPositionIncrement();
String termText = new String(token.termBuffer(), 0, token.termLength());
     q.add(new Term("content", termText), j);
   }

   Hits hits = searcher.search(q);
   int[] ranks = new int[] { 0 };
   compareRanks(hits, ranks);
 }

-- DM

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to