RE: making a valid file name...

2006-10-18 Thread Matthew Warren
 
 
 Hi I'm writing a python script that creates directories from user
 input.
 Sometimes the user inputs characters that aren't valid 
 characters for a
 file or directory name.
 Here are the characters that I consider to be valid characters...
 
 valid =
 ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
 
 if I have a string called fname I want to go through each character in
 the filename and if it is not a valid character, then I want 
 to replace
 it with a space.
 
 This is what I have:
 
 def fixfilename(fname):
   valid =
 ':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
   for i in range(len(fname)):
   if valid.find(fname[i])  0:
   fname[i] = ' '
return fname
 
 Anyone think of a simpler solution?
 

I got;

 import re
 badfilename='£%^£^£$^ihgeroighroeig3645^£$^knovin98u4#346#1461461'
 valid=':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
 goodfilename=re.sub('[^'+valid+']',' ',badfilename)
 goodfilename
'   ^  ^   ^ihgeroighroeig3645^  ^ knovin98u4 346 1461461'



This email is confidential and may be privileged. If you are not the intended 
recipient please notify the sender immediately and delete the email from your 
computer. 

You should not copy the email, use it for any purpose or disclose its contents 
to any other person.
Please note that any views or opinions presented in this email may be personal 
to the author and do not necessarily represent the views or opinions of Digica.
It is the responsibility of the recipient to check this email for the presence 
of viruses. Digica accepts no liability for any damage caused by any virus 
transmitted by this email.

UK: Phoenix House, Colliers Way, Nottingham, NG8 6AT UK
Reception Tel: + 44 (0) 115 977 1177
Support Centre: 0845 607 7070
Fax: + 44 (0) 115 977 7000
http://www.digica.com

SOUTH AFRICA: Building 3, Parc du Cap, Mispel Road, Bellville, 7535, South 
Africa
Tel: + 27 (0) 21 957 4900
Fax: + 27 (0) 21 948 3135
http://www.digica.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-18 Thread Fredrik Lundh
Matthew Warren wrote:

 import re
 badfilename='£%^£^£$^ihgeroighroeig3645^£$^knovin98u4#346#1461461'
 valid=':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
 goodfilename=re.sub('[^'+valid+']',' ',badfilename)

to create arbitrary character sets, it's usually best to run the character 
string through
re.escape() before passing it to the RE engine.

/F 



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: making a valid file name...

2006-10-18 Thread Fabio Chelly
You should use the s.translate()
It's 100x faster:

#Creates the translation table
ValidChars = :./,^0123456789abcdefghijklmnopqrstuvwxyz
InvalidChars = .join([chr(i) for i in range(256) if not 
chr(i).lower() in ValidChars])
TranslationTable = .join([chr(i) for i in range(256)])

def valid_filename(fname):
  return fname.translate(TranslationTable, InvalidChars)

 valid =
 ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
 
 if I have a string called fname I want to go through each character in
 the filename and if it is not a valid character, then I want 
 to replace
 it with a space.

-- 
Ceci est une signature automatique de MesNews.
Site : http://www.mesnews.net


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-18 Thread bearophileHUGS
Tim Chase:
 In practice, however, for such small strings as the given
 whitelist, the underlying find() operation likely doesn't put a
 blip on the radar.  If your whitelist were some huge document
 that you were searching repeatedly, it could have worse
 performance.  Additionally, the find() in the underlying C code
 is likely about as bare-metal as it gets, whereas the set
 membership aspect of things may go through some more convoluted
 setup/teardown/hashing and spend a lot more time further from the
 processor's op-codes.

With this specific test (half good half bad), on Py2.5, on my PC, sets
start to be faster than the string search when the string good is
about 5-6 chars long (this means set are quite fast, I presume).

from random import choice, seed
from time import clock

def main(choice=choice):
seed(1)
n = 10

for good in (ab, abc, abcdef, abcdefgh,
 abcdefghijklmnopqrstuvwxyz):
poss = good + good.upper()
data = [choice(poss) for _ in xrange(n)] * 10
print len(good) = , len(good)

t = clock()
for c in data:
c in good
print round(clock()-t, 2)

t = clock()
sgood = set(good)
for c in data:
c in sgood
print round(clock()-t, 2), \n

main()


Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-18 Thread Neil Cerutti
On 2006-10-18, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Tim Chase:
 In practice, however, for such small strings as the given
 whitelist, the underlying find() operation likely doesn't put a
 blip on the radar.  If your whitelist were some huge document
 that you were searching repeatedly, it could have worse
 performance.  Additionally, the find() in the underlying C code
 is likely about as bare-metal as it gets, whereas the set
 membership aspect of things may go through some more convoluted
 setup/teardown/hashing and spend a lot more time further from the
 processor's op-codes.

 With this specific test (half good half bad), on Py2.5, on my PC, sets
 start to be faster than the string search when the string good is
 about 5-6 chars long (this means set are quite fast, I presume).

 from random import choice, seed
 from time import clock

 def main(choice=choice):
 seed(1)
 n = 10

 for good in (ab, abc, abcdef, abcdefgh,
  abcdefghijklmnopqrstuvwxyz):
 poss = good + good.upper()
 data = [choice(poss) for _ in xrange(n)] * 10
 print len(good) = , len(good)

 t = clock()
 for c in data:
 c in good
 print round(clock()-t, 2)

 t = clock()
 sgood = set(good)
 for c in data:
 c in sgood
 print round(clock()-t, 2), \n

 main()

On my Python2.4 for Windows, they are often still neck-and-neck
for len(good) = 26. set's disadvantage of having to be
constructed is heavily amortized over 100,000 membership
tests. Without knowing the usage pattern, it'd be hard to choose
between them.

-- 
Neil Cerutti
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-17 Thread Jerry
I would suggest something like string.maketrans
http://docs.python.org/lib/node41.html.  I don't remember exactly how
it works, but I think it's something like

 invalid_chars = abc
 replace_chars = 123
 char_map = string.maketrans(invalid_chars, replace_chars)
 filename = abc123.txt
 filename.translate(charmap)
'123123.txt'

--
Jerry

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-17 Thread Jon Clements

SpreadTooThin wrote:

 Hi I'm writing a python script that creates directories from user
 input.
 Sometimes the user inputs characters that aren't valid characters for a
 file or directory name.
 Here are the characters that I consider to be valid characters...

 valid =
 ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '

 if I have a string called fname I want to go through each character in
 the filename and if it is not a valid character, then I want to replace
 it with a space.

 This is what I have:

 def fixfilename(fname):
   valid =
 ':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
   for i in range(len(fname)):
   if valid.find(fname[i])  0:
   fname[i] = ' '
return fname

 Anyone think of a simpler solution?

If you want to strip 'em:

 valid=':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
 filename = '!£!£$$££$%$£%$£lasfjalsfjdlasfjasfd()()()somethingelse.dat'
 stripped = ''.join(c for c in filename if c in valid)
 stripped
'lasfjalsfjdlasfjasfdsomethingelse.dat'

If you want to replace them with something, be careful of the regex
string  being built (ie a space character).
import re
 re.sub(r'[^%s]' % valid,' ',filename)
' lasfjalsfjdlasfjasfd  somethingelse.dat'


Jon.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-17 Thread Tim Chase
 Sometimes the user inputs characters that aren't valid 
 characters for a file or directory name. Here are the
 characters that I consider to be valid characters...
 
 valid =
 ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '

Just a caveat, as colons and slashes can give grief on various 
operating systems...combined with periods, it may be possible to 
cause trouble too...

 This is what I have:
 
 def fixfilename(fname):
   valid =
 ':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
   for i in range(len(fname)):
   if valid.find(fname[i])  0:
   fname[i] = ' '
return fname
 
 Anyone think of a simpler solution?

I don't know if it's simpler, but you can use

  fname = this is a test  it ain't expen$ive.py
  ''.join(c in valid and c or ' ' for c in fname)
'this is a test   it ain t expen ive.py'

It does use the it's almost a ternary operator, but not quite 
method concurrently being discussed/lambasted in another thread. 
  Treat accordingly, with all that may entail.  Should be good in 
this case though.

If you're doing it on a time-critical basis, it might help to 
make valid a set, which should have O(1) membership testing, 
rather than using the in test with a string.  I don't know how 
well the find() method of a string performs in relationship to 
in testing of a set.  Test and see, if it's important.

-tkc



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-17 Thread Edgar Matzinger
Hi,

On 10/17/2006 06:22:45 PM, SpreadTooThin wrote:
 valid =
 ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
 

   not specifying the OS platform, these are not all the characters
that may occur in a filename: '[]{}-=, etc. And '/' is NOT valid.
On a unix platform. And it should be easy to scan the filename and
check every character against the 'valid-string'.

HTH, cu l8r, Edgar.
-- 
\|||/
(o o) Just curious...
ooO-(_)-Ooo-
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-17 Thread Neil Cerutti
On 2006-10-17, Tim Chase [EMAIL PROTECTED] wrote:
 If you're doing it on a time-critical basis, it might help to
 make valid a set, which should have O(1) membership testing,
 rather than using the in test with a string.  I don't know
 how well the find() method of a string performs in relationship
 to in testing of a set.  Test and see, if it's important.

The find method of (8-bit) strings is really, really fast. My
guess is that set can't beat it. I tried to beat it recently with
a binary search function. Even after applying psyco find was
still faster (though I could beat the bisect functions by a
little bit by replacing a divide with a shift).

-- 
Neil Cerutti
This is not a book to be put down lightly. It should be thrown
with great force. --Dorothy Parker
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-17 Thread Tim Chase
 If you're doing it on a time-critical basis, it might help to
 make valid a set, which should have O(1) membership testing,
 rather than using the in test with a string.  I don't know
 how well the find() method of a string performs in relationship
 to in testing of a set.  Test and see, if it's important.
 
 The find method of (8-bit) strings is really, really fast. My
 guess is that set can't beat it. I tried to beat it recently with
 a binary search function. Even after applying psyco find was
 still faster (though I could beat the bisect functions by a
 little bit by replacing a divide with a shift).

In theory (you know...that little town in west Texas where 
everything goes right), a set-membership test should be O(1).  A 
binary search function would be O(log N).  A linear search of a 
string for a member should be O(N).

In practice, however, for such small strings as the given 
whitelist, the underlying find() operation likely doesn't put a 
blip on the radar.  If your whitelist were some huge document 
that you were searching repeatedly, it could have worse 
performance.  Additionally, the find() in the underlying C code 
is likely about as bare-metal as it gets, whereas the set 
membership aspect of things may go through some more convoluted 
setup/teardown/hashing and spend a lot more time further from the 
processor's op-codes.

And I know that a number of folks have done some hefty 
optimization of Python's string-handling abilities.  There's 
likely a tradeoff point where it's better to use one over the 
other depending on the size of the whitelist.  YMMV

-tkc







-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-17 Thread Neil Cerutti
On 2006-10-17, Edgar Matzinger [EMAIL PROTECTED] wrote:
 Hi,

 On 10/17/2006 06:22:45 PM, SpreadTooThin wrote:
 valid =
 ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
 

 not specifying the OS platform, these are not all the
 characters that may occur in a filename: '[]{}-=, etc. And '/'
 is NOT valid.  On a unix platform. And it should be easy to
 scan the filename and check every character against the
 'valid-string'.

In the interactive fiction world where I come from, a portable
filename is only 8 chars long and matches the regex
[A-Z][A-Z0-9]*, i.e., capital letters and numbers, with no
extension. That way it'll work on old DOS machines and on
Risc-OS. Wait... is there Python for Risc-OS?


-- 
Neil Cerutti


 HTH, cu l8r, Edgar.
-- 
http://mail.python.org/mailman/listinfo/python-list