Antonio Cavedoni wrote: > On 17 Jul 2006, at 8:25, tsuyuki makoto wrote: >> We Japanese know that we can't transarate Japanese to ASCII. >> So I want to do it as follows at least. >> A letter does not disappear and is restored. >> #FileField and ImageField have same letters disappear problem. >> >> def slug_ja(word) : >> try : >> unicode(word, 'ASCII') >> import re >> slug = re.sub('[^\w\s-]', '', word).strip().lower() >> slug = re.sub('[-\s]+', '-', slug) >> return slug >> except UnicodeDecodeError : >> from encodings import idna >> painful_slug = word.strip().lower().decode('utf-8').encode >> ('IDNA') >> return painful_slug > > I’m not convinced by this approach, but I would suggest using the > “punycode” instead of the “idna” encoder anyway. The results don’t > include the initial “xn--” marks which are only useful in a domain > name, not in a URI path. Also, the “from encodings […]” line appears > to be unnecessary on my Python 2.3.5 and 2.4.1 on OSX. > > [[[ > >>> p = u"perché" > >>> from encodings import idna > >>> p.encode('idna') > 'xn--perch-fsa' > >>> p.encode('punycode') > 'perch-fsa' > >>> puny = 'perch-fsa' > >>> puny.decode('punycode') > u'perch\xe9' > >>> print puny.decode('punycode') > perché > >>> pu = puny.decode('punycode') # it's reversible > >>> print pu > perché > ]]] > > More on Punycode: http://en.wikipedia.org/wiki/Punycode
i somehow have the feeling that we lost the original idea here a little. (as far as i understand, by urlify.js we are talking about slug auto-generation, please correct me if i'm wrong). we are auto-generating slugs when it "makes sense". for example, for english it makes sense to remove all the non-word stuff, because what remains can still be read, be understood, and generally looks fine when being a part of the URL. also, for many languages (hungarian or slavic ones), it also "makes sense" to simply drop all the diacritical marks, because the rest can still be read, be understood, and looks fine as part of an URL. but with punycode or whatever-code encoding japanese, what's the point? what you get will be completely unreadable.. if you only need to preserve the submitted data, you don't need to do anything. simply take your unicode text, encode it to utf8, url-escape it and use it as a part of the url. it will be ok. and on the other side you can url-unescape and utf8-decode it and you're back. you will even be able to have ascii stuff readably-preserved. form my point of view, with the current slug-approach, you either can convert your text into ascii that "makes sense" or not. if the former, then enhancing urlify.js makes sense. if the latter, then it makes no sense. imho. gabor --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers -~----------~----~----~----~------~----~------~--~---