2006/7/6, Bill de hÓra:

Thomas Broyer wrote:
> Slug: Les_Fran%C3%A7ais_ont_gagn%C3%A9_hier_soir

This is missing the point, if we are talking about slugs. If you go and
look at what tools actually *do*, you'll see they strip down labels by
removing certain characters so they can dropped into URLs with no fuss
and with consistency. So if you send in this:

   Slug: a-picture-of-my-house

most of the time, your URL is going to look like this:

   */a_picture_of_my_house

along with any other bits the slug code uses to create the link. Some
examples follow.

What's the problem in putting that "transformation/filtering" on the
client side? Servers could eventually transform and filter it out once
more if they want...

Eventually, you could just send:

   Slug: Les%20Fran%C3%A7ais%20ont%20gagn%C3%A9%20hier%20soir

which is just the URI-encoded version of "Les Français ont gagné hier
soir" (see below)

If type Les_Fran%C3%A7ais_ont_gagn%C3%A9_hier_soir into a Plone site, it
will come back with:

  /les_fran-c3-a7ais_ont_gagn-c3-a9_hier_soir

If type Les_Fran%C3%A7ais_ont_gagn%C3%A9_hier_soir into a movable type
blog, it will come back with:

http://www.dehora.net/journal/2006/07/les_franc3a7ais_ont_gagnc3a9_hier_soir.html

What if you type "Les Français ont gagné hier soir" or
"Les_Français_ont_gagné_hier_soir"?

I was talking about using "segment-nz" (from the URI spec) instead of
"isegment-nz" (from the IRI spec), where the main difference is that
you first have to UTF8-encode the text, then %-encode the UTF8-encoded
bytes.

-1 if we're going to redefine/expand what slugging actually means or
inject new requirements on tools by way of spec riders.

Whatever slugging is, a client will have to encode the value if it
contains non-ASCII characters, and URI-encoding:
- is far easier than RFC2047
- is appropriate because the slug is designed to end up as an URI segment
- relies on UTF-8, which is made mandatory by XML for XML parsers,
which are necessary to do Atom and APP

A better example would be:

    Slug: A Picture of my  House

and let the server sort it out.

So:

   Slug: A%20Picture%20of%20my%20%20House

and let the server sort it out.

The server first URI-decodes it (far easier than RFC2047 unencoding),
then transforms/filters it to generate the URI segment (which will be
URI-encoded if it contains non-ASCII characters).

That's all about encoding, no more no less .

--
Thomas Broyer

Reply via email to