Re: making a valid file name...

2006-10-18 Thread Neil Cerutti
On 2006-10-18, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Tim Chase:
>> In practice, however, for such small strings as the given
>> whitelist, the underlying find() operation likely doesn't put a
>> blip on the radar.  If your whitelist were some huge document
>> that you were searching repeatedly, it could have worse
>> performance.  Additionally, the find() in the underlying C code
>> is likely about as bare-metal as it gets, whereas the set
>> membership aspect of things may go through some more convoluted
>> setup/teardown/hashing and spend a lot more time further from the
>> processor's op-codes.
>
> With this specific test (half good half bad), on Py2.5, on my PC, sets
> start to be faster than the string search when the string "good" is
> about 5-6 chars long (this means set are quite fast, I presume).
>
> from random import choice, seed
> from time import clock
>
> def main(choice=choice):
> seed(1)
> n = 10
>
> for good in ("ab", "abc", "abcdef", "abcdefgh",
>  "abcdefghijklmnopqrstuvwxyz"):
> poss = good + good.upper()
> data = [choice(poss) for _ in xrange(n)] * 10
> print "len(good) = ", len(good)
>
> t = clock()
> for c in data:
> c in good
> print round(clock()-t, 2)
>
> t = clock()
> sgood = set(good)
> for c in data:
> c in sgood
> print round(clock()-t, 2), "\n"
>
> main()

On my Python2.4 for Windows, they are often still neck-and-neck
for len(good) = 26. set's disadvantage of having to be
constructed is heavily amortized over 100,000 membership
tests. Without knowing the usage pattern, it'd be hard to choose
between them.

-- 
Neil Cerutti
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-18 Thread bearophileHUGS
Tim Chase:
> In practice, however, for such small strings as the given
> whitelist, the underlying find() operation likely doesn't put a
> blip on the radar.  If your whitelist were some huge document
> that you were searching repeatedly, it could have worse
> performance.  Additionally, the find() in the underlying C code
> is likely about as bare-metal as it gets, whereas the set
> membership aspect of things may go through some more convoluted
> setup/teardown/hashing and spend a lot more time further from the
> processor's op-codes.

With this specific test (half good half bad), on Py2.5, on my PC, sets
start to be faster than the string search when the string "good" is
about 5-6 chars long (this means set are quite fast, I presume).

from random import choice, seed
from time import clock

def main(choice=choice):
seed(1)
n = 10

for good in ("ab", "abc", "abcdef", "abcdefgh",
 "abcdefghijklmnopqrstuvwxyz"):
poss = good + good.upper()
data = [choice(poss) for _ in xrange(n)] * 10
print "len(good) = ", len(good)

t = clock()
for c in data:
c in good
print round(clock()-t, 2)

t = clock()
sgood = set(good)
for c in data:
c in sgood
print round(clock()-t, 2), "\n"

main()


Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-18 Thread Fabio Chelly
You should use the s.translate()
It's 100x faster:

#Creates the translation table
ValidChars = ":./,^0123456789abcdefghijklmnopqrstuvwxyz"
InvalidChars = "".join([chr(i) for i in range(256) if not 
chr(i).lower() in ValidChars])
TranslationTable = "".join([chr(i) for i in range(256)])

def valid_filename(fname):
  return fname.translate(TranslationTable, InvalidChars)

>> valid =
>> ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
>> 
>> if I have a string called fname I want to go through each character in
>> the filename and if it is not a valid character, then I want 
>> to replace
>> it with a space.

-- 
Ceci est une signature automatique de MesNews.
Site : http://www.mesnews.net


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-18 Thread Fredrik Lundh
Matthew Warren wrote:

>>> import re
>>> badfilename='£"%^"£^"£$^ihgeroighroeig3645^£$^"knovin98u4#346#1461461'
>>> valid=':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
>>> goodfilename=re.sub('[^'+valid+']',' ',badfilename)

to create arbitrary character sets, it's usually best to run the character 
string through
re.escape() before passing it to the RE engine.

 



-- 
http://mail.python.org/mailman/listinfo/python-list

RE: making a valid file name...

2006-10-18 Thread Matthew Warren
 
> 
> Hi I'm writing a python script that creates directories from user
> input.
> Sometimes the user inputs characters that aren't valid 
> characters for a
> file or directory name.
> Here are the characters that I consider to be valid characters...
> 
> valid =
> ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
> 
> if I have a string called fname I want to go through each character in
> the filename and if it is not a valid character, then I want 
> to replace
> it with a space.
> 
> This is what I have:
> 
> def fixfilename(fname):
>   valid =
> ':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
>   for i in range(len(fname)):
>   if valid.find(fname[i]) < 0:
>   fname[i] = ' '
>return fname
> 
> Anyone think of a simpler solution?
> 

I got;

>>> import re
>>> badfilename='£"%^"£^"£$^ihgeroighroeig3645^£$^"knovin98u4#346#1461461'
>>> valid=':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
>>> goodfilename=re.sub('[^'+valid+']',' ',badfilename)
>>> goodfilename
'   ^  ^   ^ihgeroighroeig3645^  ^ knovin98u4 346 1461461'



This email is confidential and may be privileged. If you are not the intended 
recipient please notify the sender immediately and delete the email from your 
computer. 

You should not copy the email, use it for any purpose or disclose its contents 
to any other person.
Please note that any views or opinions presented in this email may be personal 
to the author and do not necessarily represent the views or opinions of Digica.
It is the responsibility of the recipient to check this email for the presence 
of viruses. Digica accepts no liability for any damage caused by any virus 
transmitted by this email.

UK: Phoenix House, Colliers Way, Nottingham, NG8 6AT UK
Reception Tel: + 44 (0) 115 977 1177
Support Centre: 0845 607 7070
Fax: + 44 (0) 115 977 7000
http://www.digica.com

SOUTH AFRICA: Building 3, Parc du Cap, Mispel Road, Bellville, 7535, South 
Africa
Tel: + 27 (0) 21 957 4900
Fax: + 27 (0) 21 948 3135
http://www.digica.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-17 Thread Neil Cerutti
On 2006-10-17, Edgar Matzinger <[EMAIL PROTECTED]> wrote:
> Hi,
>
> On 10/17/2006 06:22:45 PM, SpreadTooThin wrote:
>> valid =
>> ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
>> 
>
> not specifying the OS platform, these are not all the
> characters that may occur in a filename: '[]{}-=", etc. And '/'
> is NOT valid.  On a unix platform. And it should be easy to
> scan the filename and check every character against the
> 'valid-string'.

In the interactive fiction world where I come from, a portable
filename is only 8 chars long and matches the regex
[A-Z][A-Z0-9]*, i.e., capital letters and numbers, with no
extension. That way it'll work on old DOS machines and on
Risc-OS. Wait... is there Python for Risc-OS?


-- 
Neil Cerutti

>
> HTH, cu l8r, Edgar.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-17 Thread Tim Chase
>> If you're doing it on a time-critical basis, it might help to
>> make "valid" a set, which should have O(1) membership testing,
>> rather than using the "in" test with a string.  I don't know
>> how well the find() method of a string performs in relationship
>> to "in" testing of a set.  Test and see, if it's important.
> 
> The find method of (8-bit) strings is really, really fast. My
> guess is that set can't beat it. I tried to beat it recently with
> a binary search function. Even after applying psyco find was
> still faster (though I could beat the bisect functions by a
> little bit by replacing a divide with a shift).

In "theory" (you know...that little town in west Texas where 
everything goes right), a set-membership test should be O(1).  A 
binary search function would be O(log N).  A linear search of a 
string for a member should be O(N).

In practice, however, for such small strings as the given 
whitelist, the underlying find() operation likely doesn't put a 
blip on the radar.  If your whitelist were some huge document 
that you were searching repeatedly, it could have worse 
performance.  Additionally, the find() in the underlying C code 
is likely about as bare-metal as it gets, whereas the set 
membership aspect of things may go through some more convoluted 
setup/teardown/hashing and spend a lot more time further from the 
processor's op-codes.

And I know that a number of folks have done some hefty 
optimization of Python's string-handling abilities.  There's 
likely a tradeoff point where it's better to use one over the 
other depending on the size of the whitelist.  YMMV

-tkc







-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-17 Thread Neil Cerutti
On 2006-10-17, Tim Chase <[EMAIL PROTECTED]> wrote:
> If you're doing it on a time-critical basis, it might help to
> make "valid" a set, which should have O(1) membership testing,
> rather than using the "in" test with a string.  I don't know
> how well the find() method of a string performs in relationship
> to "in" testing of a set.  Test and see, if it's important.

The find method of (8-bit) strings is really, really fast. My
guess is that set can't beat it. I tried to beat it recently with
a binary search function. Even after applying psyco find was
still faster (though I could beat the bisect functions by a
little bit by replacing a divide with a shift).

-- 
Neil Cerutti
This is not a book to be put down lightly. It should be thrown
with great force. --Dorothy Parker
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-17 Thread Edgar Matzinger
Hi,

On 10/17/2006 06:22:45 PM, SpreadTooThin wrote:
> valid =
> ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
> 

   not specifying the OS platform, these are not all the characters
that may occur in a filename: '[]{}-=", etc. And '/' is NOT valid.
On a unix platform. And it should be easy to scan the filename and
check every character against the 'valid-string'.

HTH, cu l8r, Edgar.
-- 
\|||/
(o o) Just curious...
ooO-(_)-Ooo-
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-17 Thread Tim Chase
> Sometimes the user inputs characters that aren't valid 
> characters for a file or directory name. Here are the
> characters that I consider to be valid characters...
> 
> valid =
> ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '

Just a caveat, as colons and slashes can give grief on various 
operating systems...combined with periods, it may be possible to 
cause trouble too...

> This is what I have:
> 
> def fixfilename(fname):
>   valid =
> ':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
>   for i in range(len(fname)):
>   if valid.find(fname[i]) < 0:
>   fname[i] = ' '
>return fname
> 
> Anyone think of a simpler solution?

I don't know if it's simpler, but you can use

 >>> fname = "this is a test & it ain't expen$ive.py"
 >>> ''.join(c in valid and c or ' ' for c in fname)
'this is a test   it ain t expen ive.py'

It does use the "it's almost a ternary operator, but not quite" 
method concurrently being discussed/lambasted in another thread. 
  Treat accordingly, with all that may entail.  Should be good in 
this case though.

If you're doing it on a time-critical basis, it might help to 
make "valid" a set, which should have O(1) membership testing, 
rather than using the "in" test with a string.  I don't know how 
well the find() method of a string performs in relationship to 
"in" testing of a set.  Test and see, if it's important.

-tkc



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-17 Thread Jon Clements

SpreadTooThin wrote:

> Hi I'm writing a python script that creates directories from user
> input.
> Sometimes the user inputs characters that aren't valid characters for a
> file or directory name.
> Here are the characters that I consider to be valid characters...
>
> valid =
> ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
>
> if I have a string called fname I want to go through each character in
> the filename and if it is not a valid character, then I want to replace
> it with a space.
>
> This is what I have:
>
> def fixfilename(fname):
>   valid =
> ':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
>   for i in range(len(fname)):
>   if valid.find(fname[i]) < 0:
>   fname[i] = ' '
>return fname
>
> Anyone think of a simpler solution?

If you want to strip 'em:

>>> valid=':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
>>> filename = '!"£!£$"$££$%$£%$£lasfjalsfjdlasfjasfd()()()somethingelse.dat'
>>> stripped = ''.join(c for c in filename if c in valid)
>>> stripped
'lasfjalsfjdlasfjasfdsomethingelse.dat'

If you want to replace them with something, be careful of the regex
string  being built (ie a space character).
import re
>>> re.sub(r'[^%s]' % valid,' ',filename)
' lasfjalsfjdlasfjasfd  somethingelse.dat'


Jon.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: making a valid file name...

2006-10-17 Thread Jerry
I would suggest something like string.maketrans
http://docs.python.org/lib/node41.html.  I don't remember exactly how
it works, but I think it's something like

>>> invalid_chars = "abc"
>>> replace_chars = "123"
>>> char_map = string.maketrans(invalid_chars, replace_chars)
>>> filename = "abc123.txt"
>>> filename.translate(charmap)
'123123.txt'

--
Jerry

-- 
http://mail.python.org/mailman/listinfo/python-list