Michael McCandless wrote:
DM Smith wrote:
Shouldn't Term have constructors that take a Token?
I think that makes sense, though normally Token appears during
analysis and Term during searching (I think?) -- how often would you
need to make a Term from a Token?
The problem I'm addressing is that tokens are used in contexts that need
String and not char[].
The call to the deprecated
String termText = token.termText();
needs to be replaced with:
String termText = new String(token.termBuffer(), 0, token.termLength());
There are over 170 calls to token.termText(), each of these places have
to be modified. In some, perhaps many, of these cases it may be possible
to use char[] directly to get a performance gain.
In the case of Term changing it to work with char[] buffer, int start,
int length, does not seem quite right. I think the ripple would keep
getting bigger. But logically, the Term's text is the text of a Token.
To me it makes sense to have a method that returns the token as a
String, but that method is deprecated and the suggested replacement is
to directly use the buffer. So this leads to the above construct.
Perhaps it would be good to add a new method and document that as one of
two replacements.
public String term() {
return termText != null ? termText : new String(token.termBuffer(), 0,
token.termLength());
}
Here is an example from QueryParser that has 5 instances, each calling
the deprecated t.termText() method. In this example, there is the
construction of a query from a token stream.
Each of the problem lines are of the pattern:
TermQuery currentQuery = new TermQuery(new Term(field, t.termText()));
To remove the deprecated call to t.termText(), the Token's buffer needs
to be marshalled with something like:
String termText = new String(token.termBuffer(), 0, token.termLength());
TermQuery currentQuery = new TermQuery(new Term(field, termText)));
/**
* @exception ParseException throw in overridden method to disallow
*/
protected Query getFieldQuery(String field, String queryText) throws
ParseException {
// Use the analyzer to get all the tokens, and then build a TermQuery,
// PhraseQuery, or nothing based on the term count
TokenStream source = analyzer.tokenStream(field, new
StringReader(queryText));
Vector v = new Vector();
org.apache.lucene.analysis.Token t;
int positionCount = 0;
boolean severalTokensAtSamePosition = false;
while (true) {
try {
t = source.next();
}
catch (IOException e) {
t = null;
}
if (t == null)
break;
v.addElement(t);
if (t.getPositionIncrement() != 0)
positionCount += t.getPositionIncrement();
else
severalTokensAtSamePosition = true;
}
try {
source.close();
}
catch (IOException e) {
// ignore
}
if (v.size() == 0)
return null;
else if (v.size() == 1) {
t = (org.apache.lucene.analysis.Token) v.elementAt(0);
return new TermQuery(new Term(field, t.termText()));
} else {
if (severalTokensAtSamePosition) {
if (positionCount == 1) {
// no phrase query:
BooleanQuery q = new BooleanQuery(true);
for (int i = 0; i < v.size(); i++) {
t = (org.apache.lucene.analysis.Token) v.elementAt(i);
TermQuery currentQuery = new TermQuery(
new Term(field, t.termText()));
q.add(currentQuery, BooleanClause.Occur.SHOULD);
}
return q;
}
else {
// phrase query:
MultiPhraseQuery mpq = new MultiPhraseQuery();
mpq.setSlop(phraseSlop);
List multiTerms = new ArrayList();
int position = -1;
for (int i = 0; i < v.size(); i++) {
t = (org.apache.lucene.analysis.Token) v.elementAt(i);
if (t.getPositionIncrement() > 0 && multiTerms.size() > 0) {
if (enablePositionIncrements) {
mpq.add((Term[])multiTerms.toArray(new Term[0]),position);
} else {
mpq.add((Term[])multiTerms.toArray(new Term[0]));
}
multiTerms.clear();
}
position += t.getPositionIncrement();
multiTerms.add(new Term(field, t.termText()));
}
if (enablePositionIncrements) {
mpq.add((Term[])multiTerms.toArray(new Term[0]),position);
} else {
mpq.add((Term[])multiTerms.toArray(new Term[0]));
}
return mpq;
}
}
else {
PhraseQuery pq = new PhraseQuery();
pq.setSlop(phraseSlop);
int position = -1;
for (int i = 0; i < v.size(); i++) {
t = (org.apache.lucene.analysis.Token) v.elementAt(i);
if (enablePositionIncrements) {
position += t.getPositionIncrement();
pq.add(new Term(field, t.termText()),position);
} else {
pq.add(new Term(field, t.termText()));
}
}
return pq;
}
}
}
Here is an example that works around the deprecated code:
public void testShingleAnalyzerWrapperPhraseQuery() throws Exception {
Analyzer analyzer = new ShingleAnalyzerWrapper(new
WhitespaceAnalyzer(), 2);
searcher = setUpSearcher(analyzer);
PhraseQuery q = new PhraseQuery();
TokenStream ts = analyzer.tokenStream("content",
new StringReader("this
sentence"));
Token token;
int j = -1;
while ((token = ts.next()) != null) {
j += token.getPositionIncrement();
String termText = new String(token.termBuffer(), 0,
token.termLength());
q.add(new Term("content", termText), j);
}
Hits hits = searcher.search(q);
int[] ranks = new int[] { 0 };
compareRanks(hits, ranks);
}
-- DM
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]