Re: making a valid file name...
On 2006-10-18, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Tim Chase: >> In practice, however, for such small strings as the given >> whitelist, the underlying find() operation likely doesn't put a >> blip on the radar. If your whitelist were some huge document >> that you were searching repeatedly, it could have worse >> performance. Additionally, the find() in the underlying C code >> is likely about as bare-metal as it gets, whereas the set >> membership aspect of things may go through some more convoluted >> setup/teardown/hashing and spend a lot more time further from the >> processor's op-codes. > > With this specific test (half good half bad), on Py2.5, on my PC, sets > start to be faster than the string search when the string "good" is > about 5-6 chars long (this means set are quite fast, I presume). > > from random import choice, seed > from time import clock > > def main(choice=choice): > seed(1) > n = 10 > > for good in ("ab", "abc", "abcdef", "abcdefgh", > "abcdefghijklmnopqrstuvwxyz"): > poss = good + good.upper() > data = [choice(poss) for _ in xrange(n)] * 10 > print "len(good) = ", len(good) > > t = clock() > for c in data: > c in good > print round(clock()-t, 2) > > t = clock() > sgood = set(good) > for c in data: > c in sgood > print round(clock()-t, 2), "\n" > > main() On my Python2.4 for Windows, they are often still neck-and-neck for len(good) = 26. set's disadvantage of having to be constructed is heavily amortized over 100,000 membership tests. Without knowing the usage pattern, it'd be hard to choose between them. -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list
Re: making a valid file name...
Tim Chase: > In practice, however, for such small strings as the given > whitelist, the underlying find() operation likely doesn't put a > blip on the radar. If your whitelist were some huge document > that you were searching repeatedly, it could have worse > performance. Additionally, the find() in the underlying C code > is likely about as bare-metal as it gets, whereas the set > membership aspect of things may go through some more convoluted > setup/teardown/hashing and spend a lot more time further from the > processor's op-codes. With this specific test (half good half bad), on Py2.5, on my PC, sets start to be faster than the string search when the string "good" is about 5-6 chars long (this means set are quite fast, I presume). from random import choice, seed from time import clock def main(choice=choice): seed(1) n = 10 for good in ("ab", "abc", "abcdef", "abcdefgh", "abcdefghijklmnopqrstuvwxyz"): poss = good + good.upper() data = [choice(poss) for _ in xrange(n)] * 10 print "len(good) = ", len(good) t = clock() for c in data: c in good print round(clock()-t, 2) t = clock() sgood = set(good) for c in data: c in sgood print round(clock()-t, 2), "\n" main() Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: making a valid file name...
You should use the s.translate() It's 100x faster: #Creates the translation table ValidChars = ":./,^0123456789abcdefghijklmnopqrstuvwxyz" InvalidChars = "".join([chr(i) for i in range(256) if not chr(i).lower() in ValidChars]) TranslationTable = "".join([chr(i) for i in range(256)]) def valid_filename(fname): return fname.translate(TranslationTable, InvalidChars) >> valid = >> ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ' >> >> if I have a string called fname I want to go through each character in >> the filename and if it is not a valid character, then I want >> to replace >> it with a space. -- Ceci est une signature automatique de MesNews. Site : http://www.mesnews.net -- http://mail.python.org/mailman/listinfo/python-list
Re: making a valid file name...
Matthew Warren wrote: >>> import re >>> badfilename='£"%^"£^"£$^ihgeroighroeig3645^£$^"knovin98u4#346#1461461' >>> valid=':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ' >>> goodfilename=re.sub('[^'+valid+']',' ',badfilename) to create arbitrary character sets, it's usually best to run the character string through re.escape() before passing it to the RE engine. -- http://mail.python.org/mailman/listinfo/python-list
RE: making a valid file name...
> > Hi I'm writing a python script that creates directories from user > input. > Sometimes the user inputs characters that aren't valid > characters for a > file or directory name. > Here are the characters that I consider to be valid characters... > > valid = > ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ' > > if I have a string called fname I want to go through each character in > the filename and if it is not a valid character, then I want > to replace > it with a space. > > This is what I have: > > def fixfilename(fname): > valid = > ':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ' > for i in range(len(fname)): > if valid.find(fname[i]) < 0: > fname[i] = ' ' >return fname > > Anyone think of a simpler solution? > I got; >>> import re >>> badfilename='£"%^"£^"£$^ihgeroighroeig3645^£$^"knovin98u4#346#1461461' >>> valid=':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ' >>> goodfilename=re.sub('[^'+valid+']',' ',badfilename) >>> goodfilename ' ^ ^ ^ihgeroighroeig3645^ ^ knovin98u4 346 1461461' This email is confidential and may be privileged. If you are not the intended recipient please notify the sender immediately and delete the email from your computer. You should not copy the email, use it for any purpose or disclose its contents to any other person. Please note that any views or opinions presented in this email may be personal to the author and do not necessarily represent the views or opinions of Digica. It is the responsibility of the recipient to check this email for the presence of viruses. Digica accepts no liability for any damage caused by any virus transmitted by this email. UK: Phoenix House, Colliers Way, Nottingham, NG8 6AT UK Reception Tel: + 44 (0) 115 977 1177 Support Centre: 0845 607 7070 Fax: + 44 (0) 115 977 7000 http://www.digica.com SOUTH AFRICA: Building 3, Parc du Cap, Mispel Road, Bellville, 7535, South Africa Tel: + 27 (0) 21 957 4900 Fax: + 27 (0) 21 948 3135 http://www.digica.com -- http://mail.python.org/mailman/listinfo/python-list
Re: making a valid file name...
On 2006-10-17, Edgar Matzinger <[EMAIL PROTECTED]> wrote: > Hi, > > On 10/17/2006 06:22:45 PM, SpreadTooThin wrote: >> valid = >> ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ' >> > > not specifying the OS platform, these are not all the > characters that may occur in a filename: '[]{}-=", etc. And '/' > is NOT valid. On a unix platform. And it should be easy to > scan the filename and check every character against the > 'valid-string'. In the interactive fiction world where I come from, a portable filename is only 8 chars long and matches the regex [A-Z][A-Z0-9]*, i.e., capital letters and numbers, with no extension. That way it'll work on old DOS machines and on Risc-OS. Wait... is there Python for Risc-OS? -- Neil Cerutti > > HTH, cu l8r, Edgar. -- http://mail.python.org/mailman/listinfo/python-list
Re: making a valid file name...
>> If you're doing it on a time-critical basis, it might help to >> make "valid" a set, which should have O(1) membership testing, >> rather than using the "in" test with a string. I don't know >> how well the find() method of a string performs in relationship >> to "in" testing of a set. Test and see, if it's important. > > The find method of (8-bit) strings is really, really fast. My > guess is that set can't beat it. I tried to beat it recently with > a binary search function. Even after applying psyco find was > still faster (though I could beat the bisect functions by a > little bit by replacing a divide with a shift). In "theory" (you know...that little town in west Texas where everything goes right), a set-membership test should be O(1). A binary search function would be O(log N). A linear search of a string for a member should be O(N). In practice, however, for such small strings as the given whitelist, the underlying find() operation likely doesn't put a blip on the radar. If your whitelist were some huge document that you were searching repeatedly, it could have worse performance. Additionally, the find() in the underlying C code is likely about as bare-metal as it gets, whereas the set membership aspect of things may go through some more convoluted setup/teardown/hashing and spend a lot more time further from the processor's op-codes. And I know that a number of folks have done some hefty optimization of Python's string-handling abilities. There's likely a tradeoff point where it's better to use one over the other depending on the size of the whitelist. YMMV -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: making a valid file name...
On 2006-10-17, Tim Chase <[EMAIL PROTECTED]> wrote: > If you're doing it on a time-critical basis, it might help to > make "valid" a set, which should have O(1) membership testing, > rather than using the "in" test with a string. I don't know > how well the find() method of a string performs in relationship > to "in" testing of a set. Test and see, if it's important. The find method of (8-bit) strings is really, really fast. My guess is that set can't beat it. I tried to beat it recently with a binary search function. Even after applying psyco find was still faster (though I could beat the bisect functions by a little bit by replacing a divide with a shift). -- Neil Cerutti This is not a book to be put down lightly. It should be thrown with great force. --Dorothy Parker -- http://mail.python.org/mailman/listinfo/python-list
Re: making a valid file name...
Hi, On 10/17/2006 06:22:45 PM, SpreadTooThin wrote: > valid = > ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ' > not specifying the OS platform, these are not all the characters that may occur in a filename: '[]{}-=", etc. And '/' is NOT valid. On a unix platform. And it should be easy to scan the filename and check every character against the 'valid-string'. HTH, cu l8r, Edgar. -- \|||/ (o o) Just curious... ooO-(_)-Ooo- -- http://mail.python.org/mailman/listinfo/python-list
Re: making a valid file name...
> Sometimes the user inputs characters that aren't valid > characters for a file or directory name. Here are the > characters that I consider to be valid characters... > > valid = > ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ' Just a caveat, as colons and slashes can give grief on various operating systems...combined with periods, it may be possible to cause trouble too... > This is what I have: > > def fixfilename(fname): > valid = > ':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ' > for i in range(len(fname)): > if valid.find(fname[i]) < 0: > fname[i] = ' ' >return fname > > Anyone think of a simpler solution? I don't know if it's simpler, but you can use >>> fname = "this is a test & it ain't expen$ive.py" >>> ''.join(c in valid and c or ' ' for c in fname) 'this is a test it ain t expen ive.py' It does use the "it's almost a ternary operator, but not quite" method concurrently being discussed/lambasted in another thread. Treat accordingly, with all that may entail. Should be good in this case though. If you're doing it on a time-critical basis, it might help to make "valid" a set, which should have O(1) membership testing, rather than using the "in" test with a string. I don't know how well the find() method of a string performs in relationship to "in" testing of a set. Test and see, if it's important. -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: making a valid file name...
SpreadTooThin wrote: > Hi I'm writing a python script that creates directories from user > input. > Sometimes the user inputs characters that aren't valid characters for a > file or directory name. > Here are the characters that I consider to be valid characters... > > valid = > ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ' > > if I have a string called fname I want to go through each character in > the filename and if it is not a valid character, then I want to replace > it with a space. > > This is what I have: > > def fixfilename(fname): > valid = > ':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ' > for i in range(len(fname)): > if valid.find(fname[i]) < 0: > fname[i] = ' ' >return fname > > Anyone think of a simpler solution? If you want to strip 'em: >>> valid=':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ' >>> filename = '!"£!£$"$££$%$£%$£lasfjalsfjdlasfjasfd()()()somethingelse.dat' >>> stripped = ''.join(c for c in filename if c in valid) >>> stripped 'lasfjalsfjdlasfjasfdsomethingelse.dat' If you want to replace them with something, be careful of the regex string being built (ie a space character). import re >>> re.sub(r'[^%s]' % valid,' ',filename) ' lasfjalsfjdlasfjasfd somethingelse.dat' Jon. -- http://mail.python.org/mailman/listinfo/python-list
Re: making a valid file name...
I would suggest something like string.maketrans http://docs.python.org/lib/node41.html. I don't remember exactly how it works, but I think it's something like >>> invalid_chars = "abc" >>> replace_chars = "123" >>> char_map = string.maketrans(invalid_chars, replace_chars) >>> filename = "abc123.txt" >>> filename.translate(charmap) '123123.txt' -- Jerry -- http://mail.python.org/mailman/listinfo/python-list