On 27 July 2018 at 18:39, Offray Vladimir Luna Cárdenas
<offray.l...@mutabit.com> wrote:
> Hi,
>
> I was ready to show a friend the Pharo web capabilities with the
> classical "myString asUrl retrieveContents", but the friend gave me a
> url that contains non Latin characters[1] and then I got an
> ZnInvalidUTF8 error.
>
> [1]
> http://www.bidchance.com/freesearch.do?&filetype=&channel=&currentpage=1&searchtype=zb&queryword=%BF%A6%CA%B2&displayStyle=&pstate=&field=&leftday=&province=&bidfile=&project=&heshi=&recommend=&field=&jing=&starttime=&endtime=&attachment=
>
> How can I process web addresses in Pharo that contain non latin
> characters like the one in [1]?

Just some blind digging...

A few levels down the stack is a call equivalent to...
    x := '%BF%A6%CA%B2'.
    ZnPercentEncoder new decode: x.
which fails with the same error.

In #decode we have...
    bytes := #[191 166 202 178].

and browsing around I discovered a useful method...
    encoder := ZnCharacterEncoder detectEncoding: bytes
"==> a ZnSimplifiedByteEncoder('iso88591' strict)"

now the following works...
    (ZnPercentEncoder new characterEncoder: encoder ) decode: x.


So maybe that helps explain it,
but I don't know how to join the dots to make it work out of the box
with "asUrl retrieveContents"

cheers -ben

Reply via email to