Another suggestion from me: How about making token object as an singleton?
> Maybe we should un-deprecate the termText() method but add javadocs > explaining that for better performance you should use the char[] reuse > methods instead? > > Mike > > DM Smith wrote: > > > Michael McCandless wrote: > >> > >> DM Smith wrote: > >> > >>> Shouldn't Term have constructors that take a Token? > >> > >> I think that makes sense, though normally Token appears during > >> analysis and Term during searching (I think?) -- how often would > >> you need to make a Term from a Token? > >> > > The problem I'm addressing is that tokens are used in contexts that > > need String and not char[]. > > The call to the deprecated > > String termText = token.termText(); > > needs to be replaced with: > > String termText = new String(token.termBuffer(), 0, > > token.termLength()); > > > > There are over 170 calls to token.termText(), each of these places > > have to be modified. In some, perhaps many, of these cases it may be > > possible to use char[] directly to get a performance gain. > > > > In the case of Term changing it to work with char[] buffer, int > > start, int length, does not seem quite right. I think the ripple > > would keep getting bigger. But logically, the Term's text is the > > text of a Token. > > > > To me it makes sense to have a method that returns the token as a > > String, but that method is deprecated and the suggested replacement > > is to directly use the buffer. So this leads to the above construct. > > Perhaps it would be good to add a new method and document that as > > one of two replacements. > > public String term() { > > return termText != null ? termText : new String(token.termBuffer(), > > 0, token.termLength()); > > } > > > > Here is an example from QueryParser that has 5 instances, each > > calling the deprecated t.termText() method. In this example, there > > is the construction of a query from a token stream. > > Each of the problem lines are of the pattern: > > TermQuery currentQuery = new TermQuery(new Term(field, > > t.termText())); > > > > To remove the deprecated call to t.termText(), the Token's buffer > > needs to be marshalled with something like: > > String termText = new String(token.termBuffer(), 0, > > token.termLength()); > > TermQuery currentQuery = new TermQuery(new Term(field, termText))); > > > > /** > > * @exception ParseException throw in overridden method to disallow > > */ > > protected Query getFieldQuery(String field, String queryText) > > throws ParseException { > > // Use the analyzer to get all the tokens, and then build a > > TermQuery, > > // PhraseQuery, or nothing based on the term count > > > > TokenStream source = analyzer.tokenStream(field, new > > StringReader(queryText)); > > Vector v = new Vector(); > > org.apache.lucene.analysis.Token t; > > int positionCount = 0; > > boolean severalTokensAtSamePosition = false; > > > > while (true) { > > try { > > t = source.next(); > > } > > catch (IOException e) { > > t = null; > > } > > if (t == null) > > break; > > v.addElement(t); > > if (t.getPositionIncrement() != 0) > > positionCount += t.getPositionIncrement(); > > else > > severalTokensAtSamePosition = true; > > } > > try { > > source.close(); > > } > > catch (IOException e) { > > // ignore > > } > > > > if (v.size() == 0) > > return null; > > else if (v.size() == 1) { > > t = (org.apache.lucene.analysis.Token) v.elementAt(0); > > return new TermQuery(new Term(field, t.termText())); > > } else { > > if (severalTokensAtSamePosition) { > > if (positionCount == 1) { > > // no phrase query: > > BooleanQuery q = new BooleanQuery(true); > > for (int i = 0; i < v.size(); i++) { > > t = (org.apache.lucene.analysis.Token) v.elementAt(i); > > TermQuery currentQuery = new TermQuery( > > new Term(field, t.termText())); > > q.add(currentQuery, BooleanClause.Occur.SHOULD); > > } > > return q; > > } > > else { > > // phrase query: > > MultiPhraseQuery mpq = new MultiPhraseQuery(); > > mpq.setSlop(phraseSlop); > > List multiTerms = new ArrayList(); > > int position = -1; > > for (int i = 0; i < v.size(); i++) { > > t = (org.apache.lucene.analysis.Token) v.elementAt(i); > > if (t.getPositionIncrement() > 0 && multiTerms.size() > 0) { > > if (enablePositionIncrements) { > > mpq.add((Term[])multiTerms.toArray(new > > Term[0]),position); > > } else { > > mpq.add((Term[])multiTerms.toArray(new Term[0])); > > } > > multiTerms.clear(); > > } > > position += t.getPositionIncrement(); > > multiTerms.add(new Term(field, t.termText())); > > } > > if (enablePositionIncrements) { > > mpq.add((Term[])multiTerms.toArray(new Term[0]),position); > > } else { > > mpq.add((Term[])multiTerms.toArray(new Term[0])); > > } > > return mpq; > > } > > } > > else { > > PhraseQuery pq = new PhraseQuery(); > > pq.setSlop(phraseSlop); > > int position = -1; > > for (int i = 0; i < v.size(); i++) { > > t = (org.apache.lucene.analysis.Token) v.elementAt(i); > > if (enablePositionIncrements) { > > position += t.getPositionIncrement(); > > pq.add(new Term(field, t.termText()),position); > > } else { > > pq.add(new Term(field, t.termText())); > > } > > } > > return pq; > > } > > } > > } > > > > > > Here is an example that works around the deprecated code: > > public void testShingleAnalyzerWrapperPhraseQuery() throws Exception { > > Analyzer analyzer = new ShingleAnalyzerWrapper(new > > WhitespaceAnalyzer(), 2); > > searcher = setUpSearcher(analyzer); > > > > PhraseQuery q = new PhraseQuery(); > > > > TokenStream ts = analyzer.tokenStream("content", > > new StringReader("this > > sentence")); > > Token token; > > int j = -1; > > while ((token = ts.next()) != null) { > > j += token.getPositionIncrement(); > > String termText = new String(token.termBuffer(), 0, > > token.termLength()); > > q.add(new Term("content", termText), j); > > } > > > > Hits hits = searcher.search(q); > > int[] ranks = new int[] { 0 }; > > compareRanks(hits, ranks); > > } > > > > -- DM > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]