Dear Everyone, I have some difficulties with the load() operation of class com.hp.hpl.jena.rdf.arp.DOM2Model. The operation seems to performs incompletely its task and I don't know what is wrong with my code. perhaps I miss some options of the parser.
I use jaxb to create Java classes for my model (I enclose the xsd file which
describes the model).
This model contains Documents, Resources, Annotation, PieceOfKnowlede and
Text.
A Resource is an abstract class which represents an object with an URI and
which can hold annotations.
Annotation (or PieceOfKnowledge) encapsulate Rdf triples as a literal within
an element data of type xsd:anyType.
A MediaUnit is a Resource.
A Text is a specialized MediaUnit with some content of type string.
To complete the model, a Document is a Resource and is composed of several
MediaUnit.
So, my application has to manage some XML documents which contains text with
embeded annotations about that text as XML serialized RDF. I choose to decode
these annotations as a com.hp.hpl.jena.rdf.model.Model to be able to make some
query on it.
I try unsuccessfully to load this model from the document with the D2Model
class.
Here is the code:
====================================
package sampleforjena;
// file IO
import java.io.File;
import java.io.IOException;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
// data model
import model.MediaUnit;
import model.Annotation;
import model.Resource;
import model.Document;
import javax.xml.bind.Unmarshaller;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.JAXBElement;
import javax.xml.transform.stream.StreamSource;
import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.rdf.arp.DOM2Model;
public class RdfIo {
private
com.hp.hpl.jena.rdf.model.Model _model;
....
// read XML Serialized document file and extract RDF model
public void decodeSampleDocFile(File annotatedRegularDoc ) {
Resource resource = new Resource();
// Unmarshall the XML document
try {
JAXBContext jContext = JAXBContext.newInstance( "model" );
Unmarshaller unmarshaller = jContext.createUnmarshaller();
JAXBElement<Resource> jroot = unmarshaller.unmarshal(new StreamSource(
annotatedRegularDoc), Resource.class);
resource = (Resource)jroot.getValue();
} catch (JAXBException e) {
System.out.println("RdfIo.decodeDocFile: Error!!! unable to read
annotatedWebLabDoc...");
e.printStackTrace();
}
// Create a model
_model = com.hp.hpl.jena.rdf.model.ModelFactory.createDefaultModel();
Document d = (Document)resource;
// get XML serialized rdf triples from annotation in document
for (MediaUnit mu: ((Document)resource).getMediaUnit()) {
for (Annotation annot: mu.getAnnotation()) {
org.apache.xerces.dom.ElementImpl dataElement =
(org.apache.xerces.dom.ElementImpl)annot.getData();
try
{
System.out.println("==rdfModelFromDomElement: createD2M ...");
// DOM2Model.createD2M("", _model).load(dataElement);
DOM2Model arp = DOM2Model.createD2M("", _model);
arp.allowRelativeURIs();
arp.load(dataElement);
}
catch (org.xml.sax.SAXParseException e)
{
e.printStackTrace();
}
// To check if the whole set of rdf statements have been loaded
// print the size of the rdf store
String annotSize = Integer.toString(_model.getGraph().size());
System.out.println("RDFIO.decodeDocFile: annot contains " +
annotSize + " triples");
}
}
}
....
}
====================================
A run time, I have some errors like
13 sept. 2011 13:36:08 com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler
warning
ATTENTION: unknown-source: {W104} Unqualified typed nodes are not allowed.
Type treated as a relative URI.
13 sept. 2011 13:36:09 com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler
error
GRAVE: unknown-source: {E205} rdf:RDF is not allowed as an element tag here.
13 sept. 2011 13:36:09 com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler
error
GRAVE: unknown-source: {E201} Multiple children of property element
That I cannot explain, but the program is no interrupted.
It behaves as if the complete model is not loaded (only 3 triples are read) ,
although the full data seems to be contained within the dataElement variable.
I must add that my documents seems correct because when I isolate the RDF part
as a XML serialized, I can perfectly read the whole set of statements (9
triples are read) with the following decodeRdfFile() operation:
// test reading XML Serialized RDF File
public void decodeRdfFile(File serializedRdf ) {
FileInputStream is;
try {
is = new FileInputStream(serializedRdf);
_model = com.hp.hpl.jena.rdf.model.ModelFactory.createDefaultModel();
_model.read(is,null);
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
}
Another remark is that I have problem with the following operation, when I try
to "pretty print" the model, which do not ease the debugging task....
// test writing XML Serialized RDF model
public void writeModel() {
try
{
//print
java.io.FileOutputStream of = new
java.io.FileOutputStream("fffModel.xml");
_model.write(of);
}
catch(IOException ie) {
ie.printStackTrace();
}
Exception in thread "main" com.hp.hpl.jena.shared.BadURIException: Only well-
formed absolute URIrefs can be included in RDF/XML output: <d> Code:
58/REQUIRED_COMPONENT_MISSING in SCHEME: A component that is required by the
scheme is missing.
at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.checkURI(BaseXMLWriter.java:768)
at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.xmlnsDecl(BaseXMLWriter.java:300)
at com.hp.hpl.jena.xmloutput.impl.Basic.writeRDFHeader(Basic.java:56)
at com.hp.hpl.jena.xmloutput.impl.Basic.writeBody(Basic.java:39)
at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.writeXMLBody(BaseXMLWriter.java:452)
at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.write(BaseXMLWriter.java:424)
at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.write(BaseXMLWriter.java:410)
at com.hp.hpl.jena.rdf.model.impl.ModelCom.write(ModelCom.java:270)
at sampleforjena.RdfIo.writeModel(RdfIo.java:112)
Sorry for the long message.
Olivier Mesnard
Doc_annotated.xml
Description: XML document
serializedRdf.xml
Description: XML document
doc.xsd
Description: application/xsd
