Hi Udo,

With a URL/URI there are two representations: the external one (the way they 
are written) and the internal one (what is really meant). ZnUrl follows this 
distinction.

When you say #asUrl (or #asZnUrl) you are actually parsing an external string 
representation. When doing so, percent decoding is done by ZnPercentEncoder. 
This class is strict, in that it does not allow non-safe, non-ascii characters 
in its input. AFAIK this is correct, but I can imagine a less strict 
interpretation (like the URL input box of a browser would allow). If you have a 
reading of the specs that says otherwise I would be very interested.

To save you from doing the encoding yourself, you have to construct the URL 
from its parts explicitly, like this:

ZnUrl new
  scheme: #http;
  host: 'myhost';
  addPathSegments: #('path' 'with' 'unlaut' 'äöü.txt');
  yourself.  

 => http://myhost/path/with/unlaut/%C3%A4%C3%B6%C3%BC.txt

Class comments and unit tests should help.

There is also this draft:

  http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/

HTH,

Sven

PS: Incidentally, this does work

  'http://myhost/path/with/umlaut/äöü.txt' asFileReference asUrl.

because #asFileReference works differently.

> On 02 Dec 2014, at 23:32, Udo Schneider <udo.schnei...@homeaddress.de> wrote:
> 
> All,
> 
> What's the expected behavior with non-ASCII characters in URLs. Let's say I 
> want to access a file named "äöü.txt" - My assumption was that Zinc takes 
> care of the UTF-8 -> 7bit (ASCII) -> Escape encoding. But there is either 
> something I don't understand or some manual steps I'm missing.
> 
> The "straightforward" way doesn't work:
> 'http://myhost/path/with/umlaut/äöü.txt' asUrl. "ZnCharacterEncodingError: 
> ASCII character expected"
> 
> Although the actual encoding seems to be able to handle it (ignoring the 
> escapes slashes for the moment:
> 'http://myhost/path/with/umlaut/äöü.txt' urlEncoded.
> "'http%3A%2F%2Fmyhost%2Fpath%2Fwith%2Fumlaut%2F%C3%A4%C3%B6%C3%BC.txt'"
> 
> Creating a URL from already escaped characters works as well:
> 'http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt' asUrl.
> "http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt";
> 
> As does the decoding of such an URL:
> 'http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt' urlDecoded.
> "'http://myhost/path/with/umlaut/äöü.txt'"
> 
> At them moment I'm manually encoding UTF-8 characters in paths segments 
> before trying to build the URL. But is this the correct way?
> 
> Best Regards,
> 
> Udo
> 
> 
> 


Reply via email to