Hi both, the JSON specification defaults to UTF8, thats why you often do not see that encoding being specified in http again: https://en.wikipedia.org/wiki/JSON#Data_portability_issues <https://en.wikipedia.org/wiki/JSON#Data_portability_issues>
Still as you have spotted this is not correct UTF8 as you can see on our portal page: http://api.gbif.org/v1/species/2882753/vernacularNames <http://api.gbif.org/v1/species/2882753/vernacularNames> The species is from our backbone which claims it is taken from the GRIN dataset, where you can see the same problem: http://www.gbif.org/species/101354008 <http://www.gbif.org/species/101354008> http://www.gbif.org/species/101354008/vernaculars <http://www.gbif.org/species/101354008/vernaculars> Also in the verbatim data as it came in: http://www.gbif.org/species/101354008/verbatim <http://www.gbif.org/species/101354008/verbatim> Ill try to see how that ended up there. Markus > On 24 Nov 2015, at 06:11, Guido Sautter <sautter at ipd.uka.de> wrote: > > Hi Jorrit, >> Ok. Sounds like we are on the same page. What do you think would be the most >> effective way to document this content issue? > collecting a bunch of links to API responses that include mangled characters > looks like a good option to me. > > Also, you might want to follow the links to the datasets and their providers, > and all the way back to the dataset source pages (some three links to follow > or so) and see if the mangled characters show up as well on the pages of the > original data providers. > > If the latter is the case, it's likely the providers' responsibility to fix > the data. If not, there might be an issue along the transfer routes between > the original providers and GBIF. > > Just a thought, > Guido > >>> On Nov 23, 2015, at 3:35 PM, Guido Sautter < <mailto:sautter at >>> ipd.uka.de>sautter at ipd.uka.de <mailto:sautter at ipd.uka.de>> wrote: >>> >>> Hi Jorrit, >>>> Thanks for your reply. >>> welcome as can be. >>> >>>> Thanks for confirming that there?s an character conversion issue happening >>>> somewhere. >>>> >>>> Since the mangled characters appear in both html and json provided by >>>> GBIF, I?d say it is probably a gbif issue. >>> Well, what we can say at this point is that GBIF _has_ mangled characters >>> ... which doesn't mean the mangling necessarily happened at their >>> facilities. >>> >>>> Is there a way to find out whether the invalid character handling occurs >>>> in a data provider or within GBIF itself? >>> Sorry to say, no. That's why I stated that characters got mangled "at some >>> point". All we can say is that it happened upstream from GBIF's API. >>> >>> Best, >>> Guido >>> >>>>> On Nov 23, 2015, at 3:14 PM, Guido Sautter <sautter at ipd.uka.de >>>>> <mailto:sautter at ipd.uka.de>> wrote: >>>>> >>>>> That usually happens when, at some point, UTF-8 encoded text is read as >>>>> ANSI. It only happens if the text contains characters above 127 (0x79), >>>>> however. >>>>> >>>>> Hope that helps, >>>>> Guido >>>>> >>>>>> Hey y?all: >>>>>> >>>>>> I am noticing some funny characters (e.g. "Wintergr??n?) for species >>>>>> available here: >>>>>> >>>>>> http://www.gbif.org/species/2882753/vernaculars >>>>>> <http://www.gbif.org/species/2882753/vernaculars> >>>>>> >>>>>> Same is observed using the api: >>>>>> >>>>>> http://api.gbif.org/v1/species/2882753/vernacularNames >>>>>> <http://api.gbif.org/v1/species/2882753/vernacularNames> >>>>>> >>>>>> I am assuming that the actual common name should be something like >>>>>> ?Wintergr?n?. >>>>>> >>>>>> While I was looking into this, I also noticed that no characterset is >>>>>> specified in http response headers. >>>>>> >>>>>> Please confirm that this is expected behavior. >>>>>> >>>>>> thx, >>>>>> -jorrit >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> API-users mailing list >>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org> >>>>>> http://lists.gbif.org/mailman/listinfo/api-users >>>>>> <http://lists.gbif.org/mailman/listinfo/api-users> >>>>> >>>>> _______________________________________________ >>>>> API-users mailing list >>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org> >>>>> http://lists.gbif.org/mailman/listinfo/api-users >>>>> <http://lists.gbif.org/mailman/listinfo/api-users> >>>> >>> >> > > _______________________________________________ > API-users mailing list > API-users at lists.gbif.org > http://lists.gbif.org/mailman/listinfo/api-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gbif.org/pipermail/api-users/attachments/20151124/1781b5ea/attachment-0001.html>
