encode short string as filename (unix/windows)

2006-03-27 Thread robert
want to encode/decode an arbitrary short 8-bit string as save filename. 
is there a good already builtin encoding to do this (without too much 
inflation) ? or re.sub expression?

or which characters are not allowed in filenames on typical OS?

robert
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: encode short string as filename (unix/windows)

2006-03-27 Thread Grant Edwards
On 2006-03-27, robert [EMAIL PROTECTED] wrote:
 want to encode/decode an arbitrary short 8-bit string as save filename. 
 is there a good already builtin encoding to do this (without too much 
 inflation) ? or re.sub expression?

 or which characters are not allowed in filenames on typical OS?

Under unix, / and NULL aer not allowed.

There are other characters that are not recommended, but those
are the only two that are not allowed.

-- 
Grant Edwards   grante Yow!  .. the MYSTERIANS are
  at   in here with my CORDUROY
   visi.comSOAP DISH!!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: encode short string as filename (unix/windows)

2006-03-27 Thread nikie
 want to encode/decode an arbitrary short 8-bit string as save filename.
 is there a good already builtin encoding to do this (without too much
 inflation) ? or re.sub expression?

 or which characters are not allowed in filenames on typical OS?

On Windows, / \ : * ? | are forbidden, and the name can't be
empty.

Using urlsafe_b64encode/...decode should work on any platform.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: encode short string as filename (unix/windows)

2006-03-27 Thread Diez B. Roggisch
robert wrote:

 want to encode/decode an arbitrary short 8-bit string as save filename.
 is there a good already builtin encoding to do this (without too much
 inflation) ? or re.sub expression?

Yuu could use the base64-encoder. Disadvantage is clearly that you can't
easily read your original text. Alternatively, three is that encoding that
is used by e.g. emails if you have an umlaut in a name. I _think_ it is
called puny-code, but I'm not sure how and if you can use that from within
python - google yourself :)

diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: encode short string as filename (unix/windows)

2006-03-27 Thread Jean-Paul Calderone
On Mon, 27 Mar 2006 18:13:17 +0200, Diez B. Roggisch [EMAIL PROTECTED] 
wrote:
robert wrote:

 want to encode/decode an arbitrary short 8-bit string as save filename.
 is there a good already builtin encoding to do this (without too much
 inflation) ? or re.sub expression?

Yuu could use the base64-encoder. Disadvantage is clearly that you can't
easily read your original text. Alternatively, three is that encoding that
is used by e.g. emails if you have an umlaut in a name. I _think_ it is
called puny-code, but I'm not sure how and if you can use that from within
python - google yourself :)

punycode is used by dns.  A commonly used email codec is quoted-printable.  
Here's an example of each:

 u'Helló world'.encode('utf-8').encode('quopri')
'Hell=C3=B3=20world'
 u'Helló world'.encode('punycode')
'Hell world-jbb'
 

Note the extra trip through utf-8 for quoted-printable, as it is not 
implemented in Python as a character encoding, but a byte encoding, so you 
cannot (safely) apply it to a unicode string.

Jean-Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: encode short string as filename (unix/windows)

2006-03-27 Thread Steven D'Aprano
On Mon, 27 Mar 2006 08:14:07 -0800, nikie wrote:

 want to encode/decode an arbitrary short 8-bit string as save filename.
 is there a good already builtin encoding to do this (without too much
 inflation) ? or re.sub expression?

 or which characters are not allowed in filenames on typical OS?
 
 On Windows, / \ : * ? | are forbidden, and the name can't be
 empty.

Windows also has a number of reserved names that you can't use. However,
in general, it is best to ignore that and just let Windows raise an error
if it chooses. But for completeness, here is the the canonical list of
prohibited file names and characters for Windows:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/naming_a_file.asp

or http://makeashorterlink.com/?I2B853DDC



-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: encode short string as filename (unix/windows)

2006-03-27 Thread robert
Jean-Paul Calderone wrote:


 punycode is used by dns.  A commonly used email codec is 
 quoted-printable.  Here's an example of each:
 
 u'Helló world'.encode('utf-8').encode('quopri')
'Hell=C3=B3=20world'
 u'Helló world'.encode('punycode')
'Hell world-jbb'

 Note the extra trip through utf-8 for quoted-printable, as it is not 
 implemented in Python as a character encoding, but a byte encoding, so 
 you cannot (safely) apply it to a unicode string.
 
 Jean-Paul
 

  u'Helló world\\/\x00'.encode('punycode')
'Hell world\\/\x00-elb'
  u'Helló world\\/\x00'.encode('utf-8').encode('quopri')
'Hell=C3=B3=20world\\/=00'
 


that doesn't remove \ /
that other base.. things similar

so finally found me reggae'ing :-(  , but this provides minimal optical 
damage for common strings ...


def encode_as_filename(s):
 def _(m): return +%02X % ord(m.group(0))
 return re.sub('[\x00/*?:|+\n]',_,s)
def decode_from_filename(s):
 def _(m): return chr(int(m.group(0)[1:],16))
 return re.sub(\\+[\dA-F]{2,2},_,s)


  newsletter.encode_as_filename('[EMAIL PROTECTED]/\\+\n\x00:+test')
'[EMAIL PROTECTED]'
  newsletter.decode_from_filename(_)
'[EMAIL PROTECTED]/\\+\n\x00:+test'
 


Robert


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: encode short string as filename (unix/windows)

2006-03-27 Thread robert
Steven D'Aprano wrote:

 On Mon, 27 Mar 2006 08:14:07 -0800, nikie wrote:
 
 
want to encode/decode an arbitrary short 8-bit string as save filename.
is there a good already builtin encoding to do this (without too much
inflation) ? or re.sub expression?

or which characters are not allowed in filenames on typical OS?

On Windows, / \ : * ? | are forbidden, and the name can't be
empty.
 
 
 Windows also has a number of reserved names that you can't use. However,
 in general, it is best to ignore that and just let Windows raise an error
 if it chooses. But for completeness, here is the the canonical list of
 prohibited file names and characters for Windows:
 
 http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/naming_a_file.asp
 
 or http://makeashorterlink.com/?I2B853DDC
 

thanks. infact to avoid COMx etc. I have also to prepend and remove a 
char like _  on encode/decode in addition to what I just posted

Robert

-- 
http://mail.python.org/mailman/listinfo/python-list