Re: Basic Python V3 Search Tool using RE module
On Fri, 27 Mar 2015 04:11 am, Gregg Dotoli wrote: >> Thanks for your help and patience. I'm new with Python. No problems! If you hang around here, pay attention to the constructive criticism you are given, and ignore the troll over on the "test1" thread, you'll learn a lot. Let's look at your code: >> import os >> import re >> # From the Root >> topdir = "." Typically, "." is not considered the root, so that comment may be a bit misleading. On Linux systems, "/" is the root. On Windows, each drive has its own root, e.g. "C:/". You should consider a more descriptive comment. >> # Regex Pattern >> pattern="DECRYPT_I" >> regexp=re.compile(pattern) As given, using a regex to search for a fixed substring is rather like firing up a nuclear-powered bulldozer to crack open a peanut. I will assume that later you will add more complicated regexes with wildcards. If not, you are literally wasting time here: substring matching with the "in" operator will be significantly faster than matching using a regex. >> for dirpath,dirnames, files in os.walk(topdir): >> for name in files: >> result=regexp.search(name) >> print(os.path.join(dirpath,name)) >> print (result) All this does is check with the string "DECRYPT_I" is in the file name. > I posted this because I thought it may be of help to others. This does > grep through all the files It absolutely does not. > and is very fast because the regex is compiled > in Python , rather than sitting in some directory as an external command. > That is where the optimization comes in. Please take this with the intention I give it: constructive advice. "More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason — including blind stupidity." — W.A. Wulf This is a great example. You have been too focused on optimizing your code and not focused enough on getting it to actually work correctly. It's fast, *not* because "the regex is compiled in Python", but because it doesn't do the work you think it does. If you had tested this code, by creating a file called "FOUND IT" containing the string "xDECRYPT_Ix" (for example), you would have discovered for yourself that your search tool does not in fact search correctly. Write your code first. Get it working. Make sure it is working. Then, and only then, should you try to optimize it. Test your code: if you haven't tested it, you don't know if it works or not. Test means, does it work the way it needs to work with files containing the string *as well as* files not containing the string? It's trivial to check that the program doesn't find DECRYPT files when there are no DECRYPT files, but if it fails to find them when they are actually there, that's a pretty big bug. And one more quote: "The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet." — Michael A. Jackson (No, not Michael Jackson the dead pop singer.) -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Basic Python V3 Search Tool using RE module
On 03/26/2015 01:11 PM, Gregg Dotoli wrote: On Wednesday, March 25, 2015 at 3:43:38 PM UTC-4, Gregg Dotoli wrote: This basic script will help to find evidence of CryptoWall on a slave drive. Although it is just a string, more complex regex patterns can be replaced with the string. It is incredible how fast Python is and how easy it has helped in quickly assessing a pool of slave drives. I'm improving it as we speak. Thanks for your help and patience. I'm new with Python. import os import re # From the Root topdir = "." # Regex Pattern pattern="DECRYPT_I" regexp=re.compile(pattern) for dirpath,dirnames, files in os.walk(topdir): for name in files: result=regexp.search(name) print(os.path.join(dirpath,name)) print (result) Gregg Dotoli I posted this because I thought it may be of help to others. This does grep through all the files and is very fast because the regex is compiled in Python , rather than sitting in some directory as an external command. That is where the optimization comes in. Let's close this thread. It "grep"s through all the filenames, but there's no open() call or equivalent there at all. it does not look inside a single file. We can stop posting to the thread, but that won't fix the bug in the code. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Basic Python V3 Search Tool using RE module
On Wednesday, March 25, 2015 at 3:43:38 PM UTC-4, Gregg Dotoli wrote: > This basic script will help to find > evidence of CryptoWall on a slave drive. Although it is > just a string, more complex regex patterns can be > replaced with the string. It is incredible how fast Python is and > how easy it has helped in quickly assessing a pool of slave drives. > I'm improving it as we speak. > > > Thanks for your help and patience. I'm new with Python. > > > import os > import re > # From the Root > topdir = "." > > # Regex Pattern > pattern="DECRYPT_I" > regexp=re.compile(pattern) > for dirpath,dirnames, files in os.walk(topdir): > for name in files: > result=regexp.search(name) > print(os.path.join(dirpath,name)) > print (result) > > > > > > Gregg Dotoli I posted this because I thought it may be of help to others. This does grep through all the files and is very fast because the regex is compiled in Python , rather than sitting in some directory as an external command. That is where the optimization comes in. Let's close this thread. Gregg -- https://mail.python.org/mailman/listinfo/python-list
Re: Basic Python V3 Search Tool using RE module
> Gregg Dotoli Are you reminding everyone who had a PC running DOS2.X-3X in 1990. It was really a pain at that time that a hard disk of an intel-MS based PC was sold hundreds of dollars, and another pain was that the buyer had to use the disabled dir in DOS after buying a HD. -- https://mail.python.org/mailman/listinfo/python-list
Re: Basic Python V3 Search Tool using RE module
On 2015-03-25 21:20, Dave Angel wrote: >> pattern="DECRYPT_I" >> regexp=re.compile(pattern) > > That could explain why it's so fast. While I might have missed it in the thread, it also seems that regexpen are overkill for this. Why not just test for if pattern in name: ... -tkc -- https://mail.python.org/mailman/listinfo/python-list
Re: Basic Python V3 Search Tool using RE module
On 03/25/2015 03:43 PM, Gregg Dotoli wrote: This basic script will help to find evidence of CryptoWall on a slave drive. Although it is just a string, more complex regex patterns can be replaced with the string. It is incredible how fast Python is and how easy it has helped in quickly assessing a pool of slave drives. I'm improving it as we speak. Thanks for your help and patience. I'm new with Python. import os import re # From the Root topdir = "." # Regex Pattern pattern="DECRYPT_I" regexp=re.compile(pattern) for dirpath,dirnames, files in os.walk(topdir): for name in files: result=regexp.search(name) print(os.path.join(dirpath,name)) print (result) Any reason you started a new thread? And I thought (from the other thread) that you were trying to search the contents of the files. Right now you're just looking for a file name containing the pattern. That could explain why it's so fast. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Basic Python V3 Search Tool using RE module
This basic script will help to find evidence of CryptoWall on a slave drive. Although it is just a string, more complex regex patterns can be replaced with the string. It is incredible how fast Python is and how easy it has helped in quickly assessing a pool of slave drives. I'm improving it as we speak. Thanks for your help and patience. I'm new with Python. import os import re # From the Root topdir = "." # Regex Pattern pattern="DECRYPT_I" regexp=re.compile(pattern) for dirpath,dirnames, files in os.walk(topdir): for name in files: result=regexp.search(name) print(os.path.join(dirpath,name)) print (result) Gregg Dotoli -- https://mail.python.org/mailman/listinfo/python-list