Hi,

I have been using Stanbol [1] to process DBpedia data files and build a
dbpedia Solr index.
Stanbol is using Jena TDB in order to load DBpedia files into a triple
store.
Unfortunately, almost all the DBpedia N-Triples files must be pre-processed
before being able to import them using Jena [2].

The following sed command is launched:

sed 's/\\\\/\\u005c\\u005c/g;s/\\\([^u"]\)/\\u005c\1/g'

Basically the backslash is replaced with the unicode character escape
sequence.

Do you think this should/could be fixed in
org.dbpedia.extraction.util.TurtleEscaper#escapeTurtle ?

Cheers
Andrea

[1] http://stanbol.apache.org/
[2]
http://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/dbpedia/dbpedia-3.8/fetch_data_en_int.sh
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to