Re: Basic Python V3 Search Tool using RE module

2015-03-26 Thread Steven D'Aprano
On Fri, 27 Mar 2015 04:11 am, Gregg Dotoli wrote:

>> Thanks for your help and patience. I'm new with Python.

No problems! If you hang around here, pay attention to the constructive
criticism you are given, and ignore the troll over on the "test1" thread,
you'll learn a lot.

Let's look at your code:


>> import os
>> import re
>> # From the Root
>> topdir = "."

Typically, "." is not considered the root, so that comment may be a bit
misleading. On Linux systems, "/" is the root. On Windows, each drive has
its own root, e.g. "C:/". You should consider a more descriptive comment.


>> # Regex Pattern
>> pattern="DECRYPT_I"
>> regexp=re.compile(pattern)

As given, using a regex to search for a fixed substring is rather like
firing up a nuclear-powered bulldozer to crack open a peanut. I will assume
that later you will add more complicated regexes with wildcards. If not,
you are literally wasting time here: substring matching with the "in"
operator will be significantly faster than matching using a regex.


>> for dirpath,dirnames, files in os.walk(topdir):
>> for name in files:
>> result=regexp.search(name)
>> print(os.path.join(dirpath,name))
>> print (result)


All this does is check with the string "DECRYPT_I" is in the file name.


> I posted this because I thought it may be of help to others. This does
> grep through all the files 

It absolutely does not.


> and is very fast because the regex is compiled 
> in Python , rather than sitting in some directory as an external command.
> That is where the optimization comes in.

Please take this with the intention I give it: constructive advice.

"More computing sins are committed in the name of efficiency 
(without necessarily achieving it) than for any other single
reason — including blind stupidity." — W.A. Wulf

This is a great example. You have been too focused on optimizing your code
and not focused enough on getting it to actually work correctly. It's fast,
*not* because "the regex is compiled in Python", but because it doesn't do
the work you think it does.

If you had tested this code, by creating a file called "FOUND IT" containing
the string "xDECRYPT_Ix" (for example), you would have discovered
for yourself that your search tool does not in fact search correctly.

Write your code first. Get it working. Make sure it is working. Then, and
only then, should you try to optimize it. Test your code: if you haven't
tested it, you don't know if it works or not.

Test means, does it work the way it needs to work with files containing the
string *as well as* files not containing the string? It's trivial to check
that the program doesn't find DECRYPT files when there are no DECRYPT
files, but if it fails to find them when they are actually there, that's a
pretty big bug.


And one more quote:

"The First Rule of Program Optimization: Don't do it. 
The Second Rule of Program Optimization (for experts only!): 
Don't do it yet." — Michael A. Jackson

(No, not Michael Jackson the dead pop singer.)



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Basic Python V3 Search Tool using RE module

2015-03-26 Thread Dave Angel

On 03/26/2015 01:11 PM, Gregg Dotoli wrote:

On Wednesday, March 25, 2015 at 3:43:38 PM UTC-4, Gregg Dotoli wrote:

This basic script will help to find
evidence of CryptoWall on a slave drive. Although it is
just a string, more complex regex patterns can be
replaced with the string. It is incredible how fast Python is and
how easy it has helped in quickly assessing a pool of slave drives.
I'm improving it as we speak.


Thanks for your help and patience. I'm new with Python.


import os
import re
# From the Root
topdir = "."

# Regex Pattern
pattern="DECRYPT_I"
regexp=re.compile(pattern)
for dirpath,dirnames, files in os.walk(topdir):
 for name in files:
 result=regexp.search(name)
 print(os.path.join(dirpath,name))
 print (result)





Gregg Dotoli


I posted this because I thought it may be of help to others. This does grep 
through all the files and is very fast because the regex is compiled in Python 
, rather than sitting in some directory as an external command.
That is where the optimization comes in.

Let's close this thread.




It "grep"s through all the filenames, but there's no open() call or 
equivalent there at all.  it does not look inside a single file.


We can stop posting to the thread, but that won't fix the bug in the code.

--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list


Re: Basic Python V3 Search Tool using RE module

2015-03-26 Thread Gregg Dotoli
On Wednesday, March 25, 2015 at 3:43:38 PM UTC-4, Gregg Dotoli wrote:
> This basic script will help to find 
> evidence of CryptoWall on a slave drive. Although it is
> just a string, more complex regex patterns can be 
> replaced with the string. It is incredible how fast Python is and
> how easy it has helped in quickly assessing a pool of slave drives.
> I'm improving it as we speak.
> 
> 
> Thanks for your help and patience. I'm new with Python.
> 
> 
> import os
> import re
> # From the Root
> topdir = "."
> 
> # Regex Pattern
> pattern="DECRYPT_I"
> regexp=re.compile(pattern)
> for dirpath,dirnames, files in os.walk(topdir):
> for name in files:
> result=regexp.search(name)
> print(os.path.join(dirpath,name))
> print (result)
> 
> 
> 
> 
> 
> Gregg Dotoli

I posted this because I thought it may be of help to others. This does grep 
through all the files and is very fast because the regex is compiled in Python 
, rather than sitting in some directory as an external command.
That is where the optimization comes in.

Let's close this thread.



Gregg
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Basic Python V3 Search Tool using RE module

2015-03-26 Thread CHIN Dihedral
> Gregg Dotoli

Are you reminding everyone who had a PC running DOS2.X-3X in 1990. 

It was really a pain at that time 
that a hard disk of an intel-MS based PC was sold hundreds of dollars, and 
another pain was that the buyer had to use the disabled 
dir in DOS after buying a HD.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Basic Python V3 Search Tool using RE module

2015-03-25 Thread Tim Chase
On 2015-03-25 21:20, Dave Angel wrote:
>> pattern="DECRYPT_I"
>> regexp=re.compile(pattern)
>
> That could explain why it's so fast.

While I might have missed it in the thread, it also seems that
regexpen are overkill for this.  Why not just test for

  if pattern in name:
...

-tkc


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Basic Python V3 Search Tool using RE module

2015-03-25 Thread Dave Angel

On 03/25/2015 03:43 PM, Gregg Dotoli wrote:


This basic script will help to find
evidence of CryptoWall on a slave drive. Although it is
just a string, more complex regex patterns can be
replaced with the string. It is incredible how fast Python is and
how easy it has helped in quickly assessing a pool of slave drives.
I'm improving it as we speak.


Thanks for your help and patience. I'm new with Python.


import os
import re
# From the Root
topdir = "."

# Regex Pattern
pattern="DECRYPT_I"
regexp=re.compile(pattern)
for dirpath,dirnames, files in os.walk(topdir):
 for name in files:
 result=regexp.search(name)
 print(os.path.join(dirpath,name))
 print (result)


Any reason you started a new thread?

And I thought (from the other thread) that you were trying to search the 
contents of the files.  Right now you're just looking for a file name 
containing the pattern.


That could explain why it's so fast.

--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list


Basic Python V3 Search Tool using RE module

2015-03-25 Thread Gregg Dotoli

This basic script will help to find 
evidence of CryptoWall on a slave drive. Although it is
just a string, more complex regex patterns can be 
replaced with the string. It is incredible how fast Python is and
how easy it has helped in quickly assessing a pool of slave drives.
I'm improving it as we speak.


Thanks for your help and patience. I'm new with Python.


import os
import re
# From the Root
topdir = "."

# Regex Pattern
pattern="DECRYPT_I"
regexp=re.compile(pattern)
for dirpath,dirnames, files in os.walk(topdir):
for name in files:
result=regexp.search(name)
print(os.path.join(dirpath,name))
print (result)





Gregg Dotoli
-- 
https://mail.python.org/mailman/listinfo/python-list