I've been trying to process DBpedia Live with a pipeline that uses 
Jena and I've found 8765 triples that Jena won't parse from

http://live.dbpedia.org/dumps/dbpedia_2012_05_31.nt.bz2

The rejected triples can be found here:

http://basekb.com/files/DBpediaRejected.nt.bz2

Several sorts of problem turn up,  I'm sure that most of them are 
problems on the DBpedia side,  such as the use of URLs that contain \u 
escapes,  both in the subject and object fields,  but also in the 
literal type field.

I'm not so sure about the use of \U escapes in labels in N-Triples, 
where there seems to be some confusion about how to handle Unicode 
characters.

http://www.w3.org/2001/sw/RDFCore/ntriples/


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to