Can someone point to the part of the discussion which talks about what the
problem is?  This thread seems to start in mid-stream...

Freebase's MQL key encoding (http://wiki.freebase.com/wiki/MQL_key_escaping)
is a completely private encoding which shouldn't have any effect on
external URIs/IRIs/references/etc

On Sun, Mar 24, 2013 at 9:44 PM, Jona Christopher Sahnwaldt <j...@sahnwaldt.de
> wrote:

> On 22 March 2013 23:21, Andrea Di Menna <ninn...@gmail.com> wrote:
> >
> > Hi Jona,
> >
> > thanks for merging the pull request!
> >
> > Anyway, couldn't we use percent encoding for Unicode code points which
> are
> > not allowed in N-Triples? (namely those outside the [#x20,#7E] range?
> > In this case we should get UTF-8 bytes and percent encode them.
> >
> > For example, as far as I can see
> >
> > Marl$00C3$00ADn$002C_$00C3$0081vila
> >
> > is
> >
> > <http://dbpedia.org/resource/Marl%C3%83%C2%ADn,_%C3%83%C2%81vila>
> >
> > where \00C3 is 0xC3 0x83
> >          \00AD is 0xC2 0xAD
> >          \0081 is 0xC2 0x81
>
> Oh, by the way, it would be
> http://dbpedia.org/resource/Marl%C3%ADn,_%C3%81vila because that's the
> UTF-8-percent-encoding for Marlín,_Ávila.
>
> The weird thing is that these Wikipedia page titles in the Freebase
> contain UTF-8-encoded characters when they should contain no encoding
> at all, just plain Unicode code points. (Of course, the characters and
> codepoints are also dollar-escaped as usual for Freebase, but that's
> not a problem.)
>
>
> JC
>
> >
> > WDYT?
> >
> > Cheers
> > Andrea
> >
> > 2013/3/22 Christopher Sahnwaldt <notificati...@github.com>
> >>
> >> Ok, I got it. It has nothing to do with your platform. These are
> actually
> >> wrong URIs. There's not much we can do about it. I don't know where
> Freebase
> >> got them from, but I assume they may actually be wrong in Wikipedia.
> >>
> >> Examples:
> >>
> >> Marl$00C3$00ADn$002C_$00C3$0081vila
> >> AD 2C and C3 81 are UTF-8 encodings, but Freebase says [1] that the
> >> numbers should be plain Unicode code points, not UTF-8 bytes. 81 is an
> >> invalid code point, so we generate an invalid URI.
> >>
> >> Bene$009A_decrees
> >> 9A is the Windows-1252 encoding for š, but 9A invalid in Unicode.
> >>
> >> Switzerland$2003
> >> 2003, 2029 etc. are valid Unicode code points, but for whitespace
> >> characters that are invalid in URIs
> >>
> >> In a nutshell: all these characters are invalid in URIs, and it's not
> our
> >> fault. I'll pull your changes in a moment.
> >>
> >> [1] http://wiki.freebase.com/wiki/MQL_key_escaping
> >>
> >> —
> >> Reply to this email directly or view it on GitHub.
> >
> >
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_mar
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to