On 27 July 2018 at 18:39, Offray Vladimir Luna Cárdenas
<[email protected]> wrote:
> Hi,
>
> I was ready to show a friend the Pharo web capabilities with the
> classical "myString asUrl retrieveContents", but the friend gave me a
> url that contains non Latin characters[1] and then I got an
> ZnInvalidUTF8 error.
>
> [1]
> http://www.bidchance.com/freesearch.do?&filetype=&channel=¤tpage=1&searchtype=zb&queryword=%BF%A6%CA%B2&displayStyle=&pstate=&field=&leftday=&province=&bidfile=&project=&heshi=&recommend=&field=&jing=&starttime=&endtime=&attachment=
>
> How can I process web addresses in Pharo that contain non latin
> characters like the one in [1]?
Just some blind digging...
A few levels down the stack is a call equivalent to...
x := '%BF%A6%CA%B2'.
ZnPercentEncoder new decode: x.
which fails with the same error.
In #decode we have...
bytes := #[191 166 202 178].
and browsing around I discovered a useful method...
encoder := ZnCharacterEncoder detectEncoding: bytes
"==> a ZnSimplifiedByteEncoder('iso88591' strict)"
now the following works...
(ZnPercentEncoder new characterEncoder: encoder ) decode: x.
So maybe that helps explain it,
but I don't know how to join the dots to make it work out of the box
with "asUrl retrieveContents"
cheers -ben