triage: Problem for the up-loading of non-ASCII character file name.

2007-02-06 Thread tsuyuki makoto

I'd like to ask for comments on #3119: problems on FileField/ImageField
with multi-byte filenames. Since this problem is caused by two reasons,
let me describe them step by step.

Multibyte characters in a filename are lost in get_valid_filaname().


As in django.db.models.fields, FileField and its subtype calls
django.utils.text.get_valid_filename() to remove all "filename-unsafe"
characters from given filename. The resulting filename consists
of alphabets, numbers, hyphens and underscores. However, the behaviour
raises undesirable effect for those country using multibyte filenames.
For example, if original filename consists all of multibyte characters
and '.txt' extension (such as 'ファイル.txt'), the resulting filename
becomes '.txt' (no filename body but only extension).

Underscore-suffix uniquification easily collapses
-

Things get worse if we have a lot of such files: since FileField
suffixes underscores after filename until the filename become unique,
if we have files of ['壱号文書.doc', '弐号文書.doc', '参号文書.doc', ...],
then filename records will become ['.doc', '_.doc', '__.doc', ...].
When the number of underscores reaches to maxlength of filename field
(100 or so), then FileField will begin to raise errors because length
of the filename exceeds limit.

Proposed solution: punicode conversion before call
django.util.text.get_valid_filename.

Add STORE_FILENAME_AS_PUNYCODE to global_settings as False by default.
Encodes the given string in punycode except the extension if
STORE_FILENAME_AS_PUNYCODE is True.
Then generate a clean file name in get_valid_filename and return it.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: proposal and patch: original_filename with FileField and ImageField

2006-12-08 Thread tsuyuki makoto

Ok, I just added a ticket(http://code.djangoproject.com/ticket/3119).
Forget this patch at this time.
Do you have any idea?

2006/12/4, tsuyuki makoto <[EMAIL PROTECTED]>:
> Hello django developers.
>
> Currently, FIleField and ImageField store file-system-safe file name.
> Imagine, if user upload a file named é.txt.
> Yes, File-system-safe file name is .txt or _.txt.
> It's not special case in Japan.
>
> I know Django says non dynamic contents should be served via apache-ish 
> server.
> But the other hand. Some client says file must have original name.
>
> So, I make FileField and ImageField to have their original file name
> like ImageField's width_field, height_field.
> And if original_filename_field is specified, Field encodes and stores
> file name as punycode.
> eg. input développement image.jpg:
>Field stores it dveloppement-image-kwa33c.jpg
>original_filename_field stores it développement image.jpg
>
> And I make file download generic view that uses original file name
>  if Field has original file name attribute.
>
> attention: patch encoding is utf8.
>
> usage:
> class TestModel(models.Model):
>   afile = 
> models.FileField(upload_to='afile',original_filename_field='orgname')
>   orgname = models.CharField(blank=True, maxlength=100)
>   class Admin:
>   pass
>
> (r'^file/(?P.*)/$','django.views.generic.simple.file_download', \
>
> dict(queryset=TestModel.objects.all(),file_field='afile')),

--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



proposal and patch: original_filename with FileField and ImageField

2006-12-04 Thread tsuyuki makoto
Hello django developers.

Currently, FIleField and ImageField store file-system-safe file name.
Imagine, if user upload a file named é.txt.
Yes, File-system-safe file name is .txt or _.txt.
It's not special case in Japan.

I know Django says non dynamic contents should be served via apache-ish server.
But the other hand. Some client says file must have original name.

So, I make FileField and ImageField to have their original file name
like ImageField's width_field, height_field.
And if original_filename_field is specified, Field encodes and stores
file name as punycode.
eg. input développement image.jpg:
   Field stores it dveloppement-image-kwa33c.jpg
   original_filename_field stores it développement image.jpg

And I make file download generic view that uses original file name
 if Field has original file name attribute.

attention: patch encoding is utf8.

usage:
class TestModel(models.Model):
  afile = models.FileField(upload_to='afile',original_filename_field='orgname')
  orgname = models.CharField(blank=True, maxlength=100)
  class Admin:
  pass

(r'^file/(?P.*)/$','django.views.generic.simple.file_download', \

dict(queryset=TestModel.objects.all(),file_field='afile')),


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---


filefield_and_dlgv.diff
Description: Binary data


Re: Re: urlify.js blocks out non-English chars - 2nd try?

2006-07-20 Thread tsuyuki makoto
2006/7/20, Gábor Farkas <[EMAIL PROTECTED]>:
>
> Jeroen Ruigrok van der Werven wrote:
> > On 7/16/06, gabor <[EMAIL PROTECTED]> wrote:
> >> i think we do not need to discuss japanese at all. after all, there's no
> >> transliteration for kanji. so it's imho pointless to argue about
> >> kana-transliteration, when you cannot transliterate kanji.
> >
> > If you mean that you cannot easily deduce whether the kanji for moon 月
> > should be transliterated according to the reading 'tsuki' or 'getsu',
> > then yes, you are correct. But you *can* transliterate them according
> > to their on or kun reading.
> >
>
> yes, you are correct on that.
> but on the other hand, what's the meaning in doing a plain on/kun
> reading-based transliteration? :-)
>
> and also, some kanjis have a lot of on/kun readings... which one will
> you use?
>
> at least for me it seems that a transliteration scheme should at least
> keep the words readable. now take a japanese word with 2 kanjis. how
> would you propose to transliterate it to still keep the meaning?

We can not apply ON or KUN for kanaji by right way automatically.
It has no exact rule.

And I don't think slug is just for human. It's for computers too.
Search-engines or some technologies may understand IDNA/Punycode(thanx
Antonio!).
#Google can understand IDNA already.
Japanese kanji should be translated into Punycode.

If slug must keep the meaning for human, you don't need care about Japanese.
It's impossible for Japanese.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~--~~~~--~~--~--~---



Re: Re: urlify.js blocks out non-English chars - 2nd try?

2006-07-17 Thread tsuyuki makoto

2006/7/17, gabor <[EMAIL PROTECTED]>:
>
> Jeroen Ruigrok van der Werven wrote:
> > On 7/12/06, Julian 'Julik' Tarkhanov <[EMAIL PROTECTED]> wrote:
> >> This is handled by Unicode standard and is called transliteration.
> >
>  >
> > Also, for Japanese, are you going to follow kunrei-shiki or rather the
> > more widely used hepburn transliteration? Or perhaps even nippon-shiki
> > if you feel like sticking to strictness.
>
> i think we do not need to discuss japanese at all. after all, there's no
> transliteration for kanji. so it's imho pointless to argue about
> kana-transliteration, when you cannot transliterate kanji.

We Japanese know that we can't transarate Japanese to ASCII.
So I want to do it as follows at least.
A letter does not disappear and is restored.
#FileField and ImageField have same letters disappear problem.

def slug_ja(word) :
try :
unicode(word, 'ASCII')
import re
slug = re.sub('[^\w\s-]', '', word).strip().lower()
slug = re.sub('[-\s]+', '-', slug)
return slug
except UnicodeDecodeError :
from encodings import idna
painful_slug = word.strip().lower().decode('utf-8').encode('IDNA')
return painful_slug

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~--~~~~--~~--~--~---