G'Day,
When doing some profiling of Jena on another matter, I noticed that the
IRI code seemed to consume a lot of time when parsing RDF/XML. On
further investigation, a minor modification to
com.hp.hpl.jena.iri.impl.AbsLexer results in a 15% speedup in IRI
creation and a 10% speedup in parsing RDF/XML, in my tests (YMMV).
How:
In com.hp.hpl.jena.iri.impl.AbsLexer
Replace
[[
final protected void rule(int rule) {
parser.matchedRule(range,rule,yytext());
}
]]
with
[[
private static final boolean DEBUG = false;
final protected void rule(int rule) {
if (DEBUG) {
parser.matchedRule(range,rule,yytext());
}
}
]]
Explanation:
This is debug code. As far as I can see yytext() returns a copy of part
of the input buffer as a stream. Parser.matchedRule prints some debug
information if DEBUG is true, which it is not.
The method rule(int) is called rather a lot, hence compiling out this
code in AbsLexer results in noticeable performance improvement.
I have run a test creating 10 million IRIs fo the form
"http://www.example.com/foo/bar/bas#nnnnn". On my machine I see approx
15% performance improvement.
I have taken the test graphs from the test in [1] and read them into a
memory model. This runs about 10% faster with the mod I am suggesting
than without the mod.
com.hp.hpl.jena.iri.test.TestPackage green lines.
I have attached a patch file rooted in the IRI project.
Brian
[1]
https://jena.svn.sourceforge.net/svnroot/jena/TDB/trunk/src-dev/reports/ReportOutOfMemoryManyGraphsTDB.java
Index: src/com/hp/hpl/jena/iri/impl/AbsLexer.java
===================================================================
RCS file: /cvsroot/jena/iri/src/com/hp/hpl/jena/iri/impl/AbsLexer.java,v
retrieving revision 1.4
diff -u -r1.4 AbsLexer.java
--- src/com/hp/hpl/jena/iri/impl/AbsLexer.java 28 Feb 2009 17:44:46 -0000
1.4
+++ src/com/hp/hpl/jena/iri/impl/AbsLexer.java 7 Apr 2011 16:48:19 -0000
@@ -72,8 +72,11 @@
parser.recordError(range,e);
}
+ private static final boolean DEBUG = false;
final protected void rule(int rule) {
- parser.matchedRule(range,rule,yytext());
+ if (DEBUG) {
+ parser.matchedRule(range,rule,yytext());
+ }
}
abstract String yytext();
protected void surrogatePair() {