I've been trying to process DBpedia Live with a pipeline that uses Jena and I've found 8765 triples that Jena won't parse from
http://live.dbpedia.org/dumps/dbpedia_2012_05_31.nt.bz2 The rejected triples can be found here: http://basekb.com/files/DBpediaRejected.nt.bz2 Several sorts of problem turn up, I'm sure that most of them are problems on the DBpedia side, such as the use of URLs that contain \u escapes, both in the subject and object fields, but also in the literal type field. I'm not so sure about the use of \U escapes in labels in N-Triples, where there seems to be some confusion about how to handle Unicode characters. http://www.w3.org/2001/sw/RDFCore/ntriples/ ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion