The following issue has been SUBMITTED. ====================================================================== http://bugs.librdf.org/mantis/view.php?id=290 ====================================================================== Reported By: anonymous Assigned To: ====================================================================== Project: Raptor RDF Parsing and Serializing Library Issue ID: 290 Category: api Reproducibility: always Severity: minor Priority: normal Status: new Parsing/Serializing Syntax: ====================================================================== Date Submitted: 2008-11-24 18:41 Last Modified: 2008-11-24 18:41 ====================================================================== Summary: Parsing turtle files with lots of namespaces is very slow Description: Turtle documents with lots of @prefix headers are very low to parse. Of eg. the first 9M triples of the 25M triples BSBM dataset takes 7m58s to parse on a 2GHz 16GB linux machine.
This is largely down to the way namespaces are repesented. A quick hack to use a simple hashtable instead of a list cuts the parse time down to 1m47s. A patch that implements the quick hack is attached. It passes as many test as before (as far as I can see), but may leak memory, and is a little more memory hungry on small files. ====================================================================== Issue History Date Modified Username Field Change ====================================================================== 2008-11-24 18:41 anonymous New Issue 2008-11-24 18:41 anonymous File Added: raptor-ns-hash.patch ====================================================================== _______________________________________________ redland-dev mailing list [email protected] http://lists.librdf.org/mailman/listinfo/redland-dev
