[Dbpedia-discussion] Backslash encoding for URIs

Andrea Di Menna Wed, 20 Mar 2013 10:17:30 -0700

Hi,

I have been using Stanbol [1] to process DBpedia data files and build a
dbpedia Solr index.
Stanbol is using Jena TDB in order to load DBpedia files into a triple
store.
Unfortunately, almost all the DBpedia N-Triples files must be pre-processed
before being able to import them using Jena [2].


The following sed command is launched:

sed 's/\\\\/\\u005c\\u005c/g;s/\\\([^u"]\)/\\u005c\1/g'

Basically the backslash is replaced with the unicode character escape
sequence.

Do you think this should/could be fixed in
org.dbpedia.extraction.util.TurtleEscaper#escapeTurtle ?

Cheers
Andrea

[1] http://stanbol.apache.org/
[2]
http://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/dbpedia/dbpedia-3.8/fetch_data_en_int.sh

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar

_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

[Dbpedia-discussion] Backslash encoding for URIs

Reply via email to