Hi both,

the JSON specification defaults to UTF8, thats why you often do not see that 
encoding being specified in http again:
https://en.wikipedia.org/wiki/JSON#Data_portability_issues 
<https://en.wikipedia.org/wiki/JSON#Data_portability_issues>


Still as you have spotted this is not correct UTF8 as you can see on our portal 
page:
http://api.gbif.org/v1/species/2882753/vernacularNames 
<http://api.gbif.org/v1/species/2882753/vernacularNames>

The species is from our backbone which claims it is taken from the GRIN 
dataset, where you can see the same problem:
http://www.gbif.org/species/101354008 <http://www.gbif.org/species/101354008>
http://www.gbif.org/species/101354008/vernaculars 
<http://www.gbif.org/species/101354008/vernaculars>

Also in the verbatim data as it came in:
http://www.gbif.org/species/101354008/verbatim 
<http://www.gbif.org/species/101354008/verbatim>


Ill try to see how that ended up there.
Markus



> On 24 Nov 2015, at 06:11, Guido Sautter <sautter at ipd.uka.de> wrote:
> 
> Hi Jorrit,
>> Ok. Sounds like we are on the same page. What do you think would be the most 
>> effective way to document this content issue?
> collecting a bunch of links to API responses that include mangled characters 
> looks like a good option to me.
> 
> Also, you might want to follow the links to the datasets and their providers, 
> and all the way back to the dataset source pages (some three links to follow 
> or so) and see if the mangled characters show up as well on the pages of the 
> original data providers.
> 
> If the latter is the case, it's likely the providers' responsibility to fix 
> the data. If not, there might be an issue along the transfer routes between 
> the original providers and GBIF.
> 
> Just a thought,
> Guido
> 
>>> On Nov 23, 2015, at 3:35 PM, Guido Sautter < <mailto:sautter at 
>>> ipd.uka.de>sautter at ipd.uka.de <mailto:sautter at ipd.uka.de>> wrote:
>>> 
>>> Hi Jorrit,
>>>> Thanks for your reply.
>>> welcome as can be.
>>> 
>>>> Thanks for confirming that there?s an character conversion issue happening 
>>>> somewhere. 
>>>> 
>>>> Since the mangled characters appear in both html and json provided by 
>>>> GBIF, I?d say it is probably a gbif issue.
>>> Well, what we can say at this point is that GBIF _has_ mangled characters 
>>> ... which doesn't mean the mangling necessarily happened at their 
>>> facilities.
>>> 
>>>> Is there a way to find out whether the invalid character handling occurs 
>>>> in a data provider or within GBIF itself?
>>> Sorry to say, no. That's why I stated that characters got mangled "at some 
>>> point". All we can say is that it happened upstream from GBIF's API.
>>> 
>>> Best,
>>> Guido
>>> 
>>>>> On Nov 23, 2015, at 3:14 PM, Guido Sautter <sautter at ipd.uka.de 
>>>>> <mailto:sautter at ipd.uka.de>> wrote:
>>>>> 
>>>>> That usually happens when, at some point, UTF-8 encoded text is read as 
>>>>> ANSI. It only happens if the text contains characters above 127 (0x79), 
>>>>> however.
>>>>> 
>>>>> Hope that helps,
>>>>> Guido
>>>>> 
>>>>>> Hey y?all:
>>>>>> 
>>>>>> I am noticing some funny characters (e.g. "Wintergr??n?) for species 
>>>>>> available here:
>>>>>> 
>>>>>> http://www.gbif.org/species/2882753/vernaculars 
>>>>>> <http://www.gbif.org/species/2882753/vernaculars>
>>>>>> 
>>>>>> Same is observed using the api:
>>>>>> 
>>>>>> http://api.gbif.org/v1/species/2882753/vernacularNames 
>>>>>> <http://api.gbif.org/v1/species/2882753/vernacularNames>
>>>>>> 
>>>>>> I am assuming that the actual common name should be something like 
>>>>>> ?Wintergr?n?.
>>>>>> 
>>>>>> While I was looking into this, I also noticed that no characterset is 
>>>>>> specified in http response headers.
>>>>>> 
>>>>>> Please confirm that this is expected behavior. 
>>>>>> 
>>>>>> thx,
>>>>>> -jorrit
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> API-users mailing list
>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>>>> http://lists.gbif.org/mailman/listinfo/api-users 
>>>>>> <http://lists.gbif.org/mailman/listinfo/api-users>
>>>>> 
>>>>> _______________________________________________
>>>>> API-users mailing list
>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>>>> http://lists.gbif.org/mailman/listinfo/api-users 
>>>>> <http://lists.gbif.org/mailman/listinfo/api-users>
>>>> 
>>> 
>> 
> 
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/api-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.gbif.org/pipermail/api-users/attachments/20151124/1781b5ea/attachment-0001.html>

Reply via email to