Parsing of RDF Data loads everything into memory
------------------------------------------------
Key: CLEREZZA-366
URL: https://issues.apache.org/jira/browse/CLEREZZA-366
Project: Clerezza
Issue Type: Improvement
Reporter: Rupert Westenthaler
The API of the org.apache.clerezza.rdf.core.serializedform.ParsingProvider does
not allow to parse the target MGraph for loading RDF data from the InputStream.
Therefore Implementations need to create there own MGraph instances.
The org.apache.clerezza.rdf.jena.parser.JenaParserProvider e.g. creates an
instance of SimpleMGraph to store the parsed Data.
This design does not allow to "stream" parsed RDF data directly into the final
destination, but forces to load everything into an intermediate graph.
This is a problem when importing big datasets especially because the
intermediate graph is kept in memory.
Currently one would use
TCProvider provider; //e.g. a TdbTcProvider instance
MGraph veryBigGraph = provider.createMGraph("http://dbPedia.org"); //e.g.
loading a dump of dbPedia.org
veryBigGraph(parser.parse(is, format, null)); //loads everything into memory
and than adding everything to the TDB store
A possible solution would be to add a second ParsingProvider.parse(..) Method
that allows to parse an existing MGraph instance.
This would allow to refactor the above code fragment like:
TCProvider provider; //e.g. a TdbTcProvider instance
MGraph veryBigGraph = provider.createMGraph("http://dbPedia.org"); //e.g.
loading a dump of dbPedia.org
parser.parse(is, veryBigGraph, format, null); //loads everything directly into
the parsed MGraph
best
Rupert Westenthaler
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.