Re: urlify.js blocks out non-English chars - 2nd try?

Bill de hÓra Fri, 07 Jul 2006 10:20:17 -0700

Malcolm Tredinnick wrote:
> Hi Bill,
> 
> On Fri, 2006-07-07 at 10:06 +0100, Bill de hÓra wrote:
>> Malcolm Tredinnick wrote:
>>
>>> There was reasonable consensus in one of the threads about doing
>>> something similar (but a bit smaller) than what Wordpress does. Now it's
>>> a case of "patches gratefully accepted". A lot of people say this is a
>>> big issue for them, so it's something that will be fixed one day, but
>>> nobody has put in a reasonable patch yet. When that happens, we can
>>> progress.
>> What's the expected scope of the downcoding? Would it be throwing a few 
>> dicts together in the admin js, or a callback to unicodedata.normalize?
> 
> I thought there was some sort of consensus; I didn't claim all the
> details had been settled. Personally, I was kind of hoping whoever wrote
> the patch might think this sort of thing through and give us a concrete
> target to throw ideas at. :-)
> 
> My own misguided thoughts (I *really* don't want to have write this
> patch): I thought the original design wish was "something that read
> sensibly" here, since slugifying is already a lossy process. If I had to
> write it today, I would do the "dictionary mapping on the client side"
> version. But you're more of an expert here: what does normalization gain
> us without having to move to fully internationalised URLs, which still
> seem to be a phishing vector: if we allow fully international URLs, then
> doing everything properly would make sense. However, is it universally
> supported as "not a security risk" in all common browsers yet?


Normalisation/decomposition gains you greater assuredness you'll throw 
away what you think you're throwing away, before you try a mapping. 
Unicode provides mappings down to ascii but it's not complete; mapping 
decisions tend to be localized/controversial.

The phishing problem with Internationalised URLs (IRIs) is in the 
internationalized domain name (IDN) where you can get redirected, and 
not so much the path segment where the slug lives. I work on atom 
protocol and IRIs are official IETF/W3C goodness these days (funny, we 
just went through slugging on the protocol list yesterday). IRIs are 
designed to to be treated as encoded Unicode (utf8 most likely) so they 
pass through systems without losing information. Slugging as I tend to 
understand it is really about dropping down to ascii and throwing 
character information away. I'm thinking that for slugs people want to 
have a character replaced with an ascii equivalent and not /preserve/ 
character data via encoding.

It really does depend on what people want from this feature. A full full 
full downcoding solution needs to go back to the server I think, do the 
whole unicode bit, and use whatever custom mappings onto ascii. Whereas 
a good enough approach would be set of js dicts sent to the client; that 
  keeps the nice js autofill feature in the admin, and will probably 
cover 95% of use cases.

cheers
Bill



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---

Re: urlify.js blocks out non-English chars - 2nd try?

Reply via email to