arne-bdt opened a new pull request, #2744:
URL: https://github.com/apache/jena/pull/2744
Faster parsing of RDF/XML by avoiding duplicated resolving of IRIs and
adding cache for IRIx in parsers
(Parsers: RRX.RDFXML_SAX, RRX.RDFXML_StAX_ev, RRX.RDFXML_StAX_sr )
GitHub issue resolved #2740
Pull request Description:
- added "public Node createURI(IRIx iriX, ...);" to the ParserProfile, which
simply uses the given IRI instead of resolving it again.
- adding general IRIx caching (org.apache.jena.atlas.lib.cache.CacheSimple)
in the parsers where the already cached
org.apache.jena.riot.system.ParserProfileStd#resolver is not applicable
- removed httpClient from org.apache.jena.riot.RDFParserBuilder and
org.apache.jena.riot.RDFParser, which took quite some time during
initialization.
- removed unused code and variables from ParserRRX_StAX_SR and
ParserRRX_StAX_EV
- now org.apache.jena.http.HttpEnv#getDftHttpClient is called from
org.apache.jena.riot.RDFParser#openTypedInputStream only if needed. HttpEnv
also holds a static reference, so that should be fine, I hope.
- added jena-benchmarks-shadedJena510 to be able to perform benchmarks
againts Jena 5.1.0
- added org.apache.jena.riot.lang.rdfxml.TestXMLParser in jena-benchmarks-kmh
Benchmark results:
```
Benchmark
(param0_GraphUri) (param1_ParserLang) Mode Cnt Score Error Units
TestXMLParser.parseXML
../testing/citations.rdf RRX.RDFXML_SAX avgt 5 47,232 ± 0,778 s/op
TestXMLParser.parseXML
../testing/citations.rdf RRX.RDFXML_StAX_ev avgt 5 76,502 ± 4,390 s/op
TestXMLParser.parseXML
../testing/citations.rdf RRX.RDFXML_StAX_sr avgt 5 48,689 ± 2,224 s/op
TestXMLParser.parseXML
../testing/citations.rdf RRX.RDFXML_ARP1 avgt 5 86,298 ± 2,440 s/op
TestXMLParser.parseXML
../testing/BSBM/bsbm-5m.xml RRX.RDFXML_SAX avgt 5 9,576 ± 0,402
s/op
TestXMLParser.parseXML
../testing/BSBM/bsbm-5m.xml RRX.RDFXML_StAX_ev avgt 5 11,562 ± 0,535
s/op
TestXMLParser.parseXML
../testing/BSBM/bsbm-5m.xml RRX.RDFXML_StAX_sr avgt 5 9,406 ± 0,465
s/op
TestXMLParser.parseXML
../testing/BSBM/bsbm-5m.xml RRX.RDFXML_ARP1 avgt 5 19,738 ± 1,526
s/op
TestXMLParser.parseXML
CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_SAX avgt
5 0,998 ± 0,223 s/op
TestXMLParser.parseXML
CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_StAX_ev avgt
5 1,325 ± 0,093 s/op
TestXMLParser.parseXML
CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_StAX_sr avgt
5 0,985 ± 0,018 s/op
TestXMLParser.parseXML
CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_ARP1 avgt
5 2,357 ± 0,163 s/op
TestXMLParser.parseXML
CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_SAX avgt
5 0,146 ± 0,029 s/op
TestXMLParser.parseXML
CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_StAX_ev avgt
5 0,192 ± 0,007 s/op
TestXMLParser.parseXML
CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_StAX_sr avgt
5 0,140 ± 0,016 s/op
TestXMLParser.parseXML
CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_ARP1 avgt
5 0,309 ± 0,098 s/op
TestXMLParser.parseXMLJena510
../testing/citations.rdf RRX.RDFXML_SAX avgt 5 57,690 ± 0,932 s/op
TestXMLParser.parseXMLJena510
../testing/citations.rdf RRX.RDFXML_StAX_ev avgt 5 84,579 ± 4,109 s/op
TestXMLParser.parseXMLJena510
../testing/citations.rdf RRX.RDFXML_StAX_sr avgt 5 56,949 ± 0,815 s/op
TestXMLParser.parseXMLJena510
../testing/citations.rdf RRX.RDFXML_ARP1 avgt 5 82,940 ± 0,815 s/op
TestXMLParser.parseXMLJena510
../testing/BSBM/bsbm-5m.xml RRX.RDFXML_SAX avgt 5 13,280 ± 0,458
s/op
TestXMLParser.parseXMLJena510
../testing/BSBM/bsbm-5m.xml RRX.RDFXML_StAX_ev avgt 5 14,994 ± 0,803
s/op
TestXMLParser.parseXMLJena510
../testing/BSBM/bsbm-5m.xml RRX.RDFXML_StAX_sr avgt 5 13,132 ± 0,166
s/op
TestXMLParser.parseXMLJena510
../testing/BSBM/bsbm-5m.xml RRX.RDFXML_ARP1 avgt 5 19,125 ± 1,044
s/op
TestXMLParser.parseXMLJena510
CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_SAX avgt
5 1,311 ± 0,018 s/op
TestXMLParser.parseXMLJena510
CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_StAX_ev avgt
5 1,693 ± 0,021 s/op
TestXMLParser.parseXMLJena510
CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_StAX_sr avgt
5 1,332 ± 0,179 s/op
TestXMLParser.parseXMLJena510
CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_ARP1 avgt
5 2,305 ± 0,280 s/op
TestXMLParser.parseXMLJena510
CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_SAX avgt
5 0,194 ± 0,028 s/op
TestXMLParser.parseXMLJena510
CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_StAX_ev avgt
5 0,227 ± 0,016 s/op
TestXMLParser.parseXMLJena510
CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_StAX_sr avgt
5 0,194 ± 0,025 s/op
TestXMLParser.parseXMLJena510
CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_ARP1 avgt
5 0,291 ± 0,039 s/op
```
----
- [ x] Tests are included.
- no Documentation change and updates needed
- [ x] Commits have been squashed to remove intermediate development commit
messages.
- [ x] Key commit messages start with the issue number (GH-xxxx)
By submitting this pull request, I acknowledge that I am making a
contribution to the Apache Software Foundation under the terms and conditions
of the [Contributor's
Agreement](https://www.apache.org/licenses/contributor-agreements.html).
----
See the [Apache Jena "Contributing"
guide](https://github.com/apache/jena/blob/main/CONTRIBUTING.md).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]