Trejkaz created LUCENE-7260:
-------------------------------
Summary: StandardQueryParser is over 100 times slower in v5
compared to v3
Key: LUCENE-7260
URL: https://issues.apache.org/jira/browse/LUCENE-7260
Project: Lucene - Core
Issue Type: Improvement
Components: modules/queryparser
Affects Versions: 5.4.1
Environment: Java 8u51
Reporter: Trejkaz
The following test code times parsing a large query.
{code}
import org.apache.lucene.analysis.KeywordAnalyzer;
//import org.apache.lucene.analysis.core.KeywordAnalyzer;
import org.apache.lucene.queryParser.standard.StandardQueryParser;
//import org.apache.lucene.queryparser.flexible.standard.StandardQueryParser;
import org.apache.lucene.search.BooleanQuery;
public class LargeQueryTest {
public static void main(String[] args) throws Exception {
BooleanQuery.setMaxClauseCount(50_000);
StringBuilder builder = new StringBuilder(50_000*10);
builder.append("id:( ");
boolean first = true;
for (int i = 0; i < 50_000; i++) {
if (first) {
first = false;
} else {
builder.append(" OR ");
}
builder.append(String.valueOf(i));
}
builder.append(" )");
String queryString = builder.toString();
StandardQueryParser parser2 = new StandardQueryParser(new
KeywordAnalyzer());
for (int i = 0; i < 10; i++) {
long t0 = System.currentTimeMillis();
parser2.parse(queryString, "nope");
long t1 = System.currentTimeMillis();
System.out.println(t1-t0);
}
}
}
{code}
For Lucene 3.6.2, the timings settle down to 200~300 with the fastest being 207.
For Lucene 5.4.1, the timings settle down to 20000~30000 with the fastest being
22444.
So at some point, some change made the query parser 100 times slower. I would
suspect that it has something to do with how the list of children is now
handled. Every time someone gets the children, it copies the list. Every time
someone sets the children, it walks through to detach parent references and
then reattaches them all again.
If it were me, I would probably make these collections immutable so that I
didn't have to defensively copy them.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]