encode short string as filename (unix/windows)
want to encode/decode an arbitrary short 8-bit string as save filename. is there a good already builtin encoding to do this (without too much inflation) ? or re.sub expression? or which characters are not allowed in filenames on typical OS? robert -- http://mail.python.org/mailman/listinfo/python-list
Re: encode short string as filename (unix/windows)
On 2006-03-27, robert [EMAIL PROTECTED] wrote: want to encode/decode an arbitrary short 8-bit string as save filename. is there a good already builtin encoding to do this (without too much inflation) ? or re.sub expression? or which characters are not allowed in filenames on typical OS? Under unix, / and NULL aer not allowed. There are other characters that are not recommended, but those are the only two that are not allowed. -- Grant Edwards grante Yow! .. the MYSTERIANS are at in here with my CORDUROY visi.comSOAP DISH!! -- http://mail.python.org/mailman/listinfo/python-list
Re: encode short string as filename (unix/windows)
want to encode/decode an arbitrary short 8-bit string as save filename. is there a good already builtin encoding to do this (without too much inflation) ? or re.sub expression? or which characters are not allowed in filenames on typical OS? On Windows, / \ : * ? | are forbidden, and the name can't be empty. Using urlsafe_b64encode/...decode should work on any platform. -- http://mail.python.org/mailman/listinfo/python-list
Re: encode short string as filename (unix/windows)
robert wrote: want to encode/decode an arbitrary short 8-bit string as save filename. is there a good already builtin encoding to do this (without too much inflation) ? or re.sub expression? Yuu could use the base64-encoder. Disadvantage is clearly that you can't easily read your original text. Alternatively, three is that encoding that is used by e.g. emails if you have an umlaut in a name. I _think_ it is called puny-code, but I'm not sure how and if you can use that from within python - google yourself :) diez -- http://mail.python.org/mailman/listinfo/python-list
Re: encode short string as filename (unix/windows)
On Mon, 27 Mar 2006 18:13:17 +0200, Diez B. Roggisch [EMAIL PROTECTED] wrote: robert wrote: want to encode/decode an arbitrary short 8-bit string as save filename. is there a good already builtin encoding to do this (without too much inflation) ? or re.sub expression? Yuu could use the base64-encoder. Disadvantage is clearly that you can't easily read your original text. Alternatively, three is that encoding that is used by e.g. emails if you have an umlaut in a name. I _think_ it is called puny-code, but I'm not sure how and if you can use that from within python - google yourself :) punycode is used by dns. A commonly used email codec is quoted-printable. Here's an example of each: u'Helló world'.encode('utf-8').encode('quopri') 'Hell=C3=B3=20world' u'Helló world'.encode('punycode') 'Hell world-jbb' Note the extra trip through utf-8 for quoted-printable, as it is not implemented in Python as a character encoding, but a byte encoding, so you cannot (safely) apply it to a unicode string. Jean-Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: encode short string as filename (unix/windows)
On Mon, 27 Mar 2006 08:14:07 -0800, nikie wrote: want to encode/decode an arbitrary short 8-bit string as save filename. is there a good already builtin encoding to do this (without too much inflation) ? or re.sub expression? or which characters are not allowed in filenames on typical OS? On Windows, / \ : * ? | are forbidden, and the name can't be empty. Windows also has a number of reserved names that you can't use. However, in general, it is best to ignore that and just let Windows raise an error if it chooses. But for completeness, here is the the canonical list of prohibited file names and characters for Windows: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/naming_a_file.asp or http://makeashorterlink.com/?I2B853DDC -- Steven. -- http://mail.python.org/mailman/listinfo/python-list
Re: encode short string as filename (unix/windows)
Jean-Paul Calderone wrote: punycode is used by dns. A commonly used email codec is quoted-printable. Here's an example of each: u'Helló world'.encode('utf-8').encode('quopri') 'Hell=C3=B3=20world' u'Helló world'.encode('punycode') 'Hell world-jbb' Note the extra trip through utf-8 for quoted-printable, as it is not implemented in Python as a character encoding, but a byte encoding, so you cannot (safely) apply it to a unicode string. Jean-Paul u'Helló world\\/\x00'.encode('punycode') 'Hell world\\/\x00-elb' u'Helló world\\/\x00'.encode('utf-8').encode('quopri') 'Hell=C3=B3=20world\\/=00' that doesn't remove \ / that other base.. things similar so finally found me reggae'ing :-( , but this provides minimal optical damage for common strings ... def encode_as_filename(s): def _(m): return +%02X % ord(m.group(0)) return re.sub('[\x00/*?:|+\n]',_,s) def decode_from_filename(s): def _(m): return chr(int(m.group(0)[1:],16)) return re.sub(\\+[\dA-F]{2,2},_,s) newsletter.encode_as_filename('[EMAIL PROTECTED]/\\+\n\x00:+test') '[EMAIL PROTECTED]' newsletter.decode_from_filename(_) '[EMAIL PROTECTED]/\\+\n\x00:+test' Robert -- http://mail.python.org/mailman/listinfo/python-list
Re: encode short string as filename (unix/windows)
Steven D'Aprano wrote: On Mon, 27 Mar 2006 08:14:07 -0800, nikie wrote: want to encode/decode an arbitrary short 8-bit string as save filename. is there a good already builtin encoding to do this (without too much inflation) ? or re.sub expression? or which characters are not allowed in filenames on typical OS? On Windows, / \ : * ? | are forbidden, and the name can't be empty. Windows also has a number of reserved names that you can't use. However, in general, it is best to ignore that and just let Windows raise an error if it chooses. But for completeness, here is the the canonical list of prohibited file names and characters for Windows: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/naming_a_file.asp or http://makeashorterlink.com/?I2B853DDC thanks. infact to avoid COMx etc. I have also to prepend and remove a char like _ on encode/decode in addition to what I just posted Robert -- http://mail.python.org/mailman/listinfo/python-list