On 22 March 2013 23:21, Andrea Di Menna <ninn...@gmail.com> wrote: > > Hi Jona, > > thanks for merging the pull request! > > Anyway, couldn't we use percent encoding for Unicode code points which are > not allowed in N-Triples? (namely those outside the [#x20,#7E] range? > In this case we should get UTF-8 bytes and percent encode them. > > For example, as far as I can see > > Marl$00C3$00ADn$002C_$00C3$0081vila > > is > > <http://dbpedia.org/resource/Marl%C3%83%C2%ADn,_%C3%83%C2%81vila> > > where \00C3 is 0xC3 0x83 > \00AD is 0xC2 0xAD > \0081 is 0xC2 0x81
Oh, by the way, it would be http://dbpedia.org/resource/Marl%C3%ADn,_%C3%81vila because that's the UTF-8-percent-encoding for Marlín,_Ávila. The weird thing is that these Wikipedia page titles in the Freebase contain UTF-8-encoded characters when they should contain no encoding at all, just plain Unicode code points. (Of course, the characters and codepoints are also dollar-escaped as usual for Freebase, but that's not a problem.) JC > > WDYT? > > Cheers > Andrea > > 2013/3/22 Christopher Sahnwaldt <notificati...@github.com> >> >> Ok, I got it. It has nothing to do with your platform. These are actually >> wrong URIs. There's not much we can do about it. I don't know where Freebase >> got them from, but I assume they may actually be wrong in Wikipedia. >> >> Examples: >> >> Marl$00C3$00ADn$002C_$00C3$0081vila >> AD 2C and C3 81 are UTF-8 encodings, but Freebase says [1] that the >> numbers should be plain Unicode code points, not UTF-8 bytes. 81 is an >> invalid code point, so we generate an invalid URI. >> >> Bene$009A_decrees >> 9A is the Windows-1252 encoding for š, but 9A invalid in Unicode. >> >> Switzerland$2003 >> 2003, 2029 etc. are valid Unicode code points, but for whitespace >> characters that are invalid in URIs >> >> In a nutshell: all these characters are invalid in URIs, and it's not our >> fault. I'll pull your changes in a moment. >> >> [1] http://wiki.freebase.com/wiki/MQL_key_escaping >> >> — >> Reply to this email directly or view it on GitHub. > > ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion