Thanks Reto,
based on what you said I decided to do an implementation of the
lookahead method with a limit set on triples instead of bytes. It should
still have a pretty decent memory footprint and takes a reasonable time.
It is now a Stanbol utility of commons.owl
Alessandro
On 8/14/12 2:11 PM, Reto Bachmann-Gmür wrote:
Hi Alessandro,
Two things:
- the mark method doesn't truncate the stream after the indicated number of
bytes, but makes sure that within the indicated number of bytes one can
reset the stream back to that position. If one reads more than the
indicated number of bytes the mark becomes invalid (i.e. reset won't work)
but otherwise the stream behaves as normal.
- I'mm not sure how the jena parser works and if you get the triples read
so far if your rdf/xml is truncated. You might want to truncate n-triples
after a dot.
Cheers,
Reto
On Tue, Aug 14, 2012 at 1:53 PM, Alessandro Adamou <[email protected]>wrote:
Hi,
I need to write a function that performs lookahead of the OWL ontology ID
for a Graph, therefore it has to scan the content up to a certain point to
see if it has found an ontology IRI / version IRI pair.
I thought that setting mark() on a BufferedInputStream did the trick,
something like:
MGraph graph = new SimpleMGraph();
BufferedInputStream bIn = new BufferedInputStream(content);
bIn.mark(1240); // Read up to 1k
parser.parse(graph, bIn, SupportedFormat.RDF_XML);
(parser has a Jena parser provider registered)
But apparently this is not working. Even for streams much longer than 1
kiB, with the interesting triples right at the very end, these triples are
always found.
Do the Clerezza parser override the marks on a buffered stream, or maybe
Jena is doing so? Or even better, am I doing this wrong?
Best,
-- Alessandro
--
M.Sc. Alessandro Adamou
Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy
Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy
"I will give you everything, just don't demand anything."
(Ettore Petrolini, 1917)
Not sent from my iSnobTechDevice
--
M.Sc. Alessandro Adamou
Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy
Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy
"I will give you everything, just don't demand anything."
(Ettore Petrolini, 1917)
Not sent from my iSnobTechDevice