Re: Strategy/ Advice for How to Best Attack this Problem?

Dave Angel Fri, 03 Apr 2015 05:06:57 -0700

On 04/02/2015 07:43 PM, Saran A wrote:
   <snip>


I debugged and rewrote everything. Here is the full version. Feel free to tear 
this apart. The homework assignment is not due until tomorrow, so I am 
currently also experimenting with pyinotify as well. I do have questions 
regarding how to make this function compatible with the ProcessEvent Class. I 
will create another post for this.

What would you advise in regards to renaming the inaptly named dirlist?

Asked and answered. You called it path at one point. dir_name wouldalso be good. The point is that if you have a good name, you're lesslikely to have unreasonable code processing that name. For example,


# # # Without data to examine here, I can only guess based on this 
requirement's language that
# # fixed records are in the input.  If so, here's a slight revision to the 
helper functions that I wrote earlier which
# # takes the function fileinfo as a starting point and demonstrates calling a 
function from within a function.
# I tested this little sample on a small set of files created with MD5 
checksums.  I wrote the Python in such a way as it
# would work with Python 2.x or 3.x (note the __future__ at the top).

# # # There are so many wonderful ways of failure, so, from a development 
standpoint, I would probably spend a bit
# # more time trying to determine which failure(s) I would want to report to 
the user, and how (perhaps creating my own Exceptions)

# # # The only other comments I would make are about safe-file handling.

# # #   #1:  Question: After a user has created a file that has failed (in
# # #        processing),can the user create a file with the same name?
# # #        If so, then you will probably want to look at some sort
# # #        of file-naming strategy to avoid overwriting evidence of
# # #        earlier failures.

# # # File naming is a tricky thing.  I referenced the tempfile module [1] and 
the Maildir naming scheme to see two different
# # types of solutions to the problem of choosing a unique filename.

## I am assuming that all of my files are going to be specified in unicode


## Utilized Spyder's Scientific Computing IDE to debug, check for indentation 
errors and test function suite

from __future__ import print_function

import os.path
import time
import logging


def initialize_logger(output_dir):

           <snip>

I didn't ever bother to read the body of this function, since a simple

      print(mydata, file=mylog_file)

will suffice to add data to the chosen text file. Why is it constantlygetting more complex, to solve a problem that was simple in the beginning?



#Returns filename, rootdir and filesize

def fileinfo(f):
     filename = os.path.basename(f)
     rootdir = os.path.dirname(f)
     filesize = os.path.getsize(f)
     return filename, rootdir, filesize

#returns length of file
def file_len(f):
     with open(f) as f:
         for i, l in enumerate(f):
             pass
             return i + 1

Always returns 1 or None. Check the indentation, and test to see whatit does for empty file, for a file with one line, and for a file withmore than one line.

#attempts to copy file and move file to it's directory
def copy_and_move_file(src, dest):

Which is it, are you trying to copy it, or move it? Pick one and make agood function name that shows your choice.

     try:
         os.rename(src, dest)

Why are you using rename, when you're trying to move the file? Take acloser look at shutil, and see if it has a function that does it saferthan rename. The function you need uses rename, when it'll work, anddoes it other ways when rename will not.

         # eg. src and dest are the same file
     except IOError as e:
         print('Error: %s' % e.strerror)

A try/except that doesn't try to correct the problem is not generallyuseful. Figure out what could be triggering the exception, and howyou're going to handle it. If it cannot be handled, terminate the program.

For example, what if you don't have permissions to modify one of thespecified directories? You can't get any useful work done, so youshould notify the user and exit. An alternative is to produce 50thousand reports of each file you've got, telling how it succeeded orfailed, over and over.


path = "."
dirlist = os.listdir(path)

def main(dirlist):
     before = dict([(f, 0) for f in dirlist])

Since dirlist is a path, it's a string. So you're looping through thecharacters of the name of the path. I still don't have a clue what thedict is supposed to mean here.

     while True:
         time.sleep(1) #time between update check


That loop goes forever, so the following code will never run.

     after = dict([(f, None) for f in dirlist])

Once again, you're looping through the letters of the directory name.Or if dirlist is really a list, and you're deciding that's what itshould be, then of course after will be identical to before.

     added = [f for f in after if not f in before]
     if added:
         f = ''.join(added)
         print('Sucessfully added %s file - ready to validate') %()
         return validate_files(f)
     else:
         return move_to_failure_folder_and_return_error_file(f)



def validate_files(f):

You completely changed the scope of this function. Why does it stillhave an s in its name?

     creation = time.ctime(os.path.getctime(f))
     lastmod = time.ctime(os.path.getmtime(f))


What do timestamps have to do with anything?

     if creation == lastmod and file_len(f) > 0:
         return move_to_success_folder_and_read(f)


How have you decided that the records are all the same length?

     if file_len < 0 and creation != lastmod:

file_len cannot be safely compared to 0, it's a function. If this werePython 3, it would give you a runtime error for trying. Even if you hadsuccessfully called the function here, how is it possible for it toreturn a negative value?

         return move_to_success_folder_and_read(f)
     else:
         return move_to_failure_folder_and_return_error_file(f)


#Potential Additions/Substitutions

def move_to_failure_folder_and_return_error_file():
     filename, rootdir, lastmod, creation, filesize = fileinfo(file)


fileinfo() returns 3 things, and you're stuffing them in 5 variables.

     os.mkdir('Failure')

What do you do the second time through this function, when thatdirectory is already existing?

     copy_and_move_file( 'Failure')

The function takes two arguments, neither of which is likely to be thatstring.

     initialize_logger('rootdir/Failure')
     logging.error("Either this file is empty or there are no lines")


def move_to_success_folder_and_read():
     filename, rootdir, lastmod, creation, filesize = fileinfo(file)
     os.mkdir('Success')
     copy_and_move_file(rootdir, 'Success') #file name
     print("Success", file)
     return file_len(file)



if __name__ == '__main__':
    main(dirlist)

So now you're passing a list of filenames to main()? How then is itgoing to know when new files arrive? You were originally going to tellit a directory name. That's why I suggested that you call the parameter'path' or 'dirname'.




--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

Re: Strategy/ Advice for How to Best Attack this Problem?

Reply via email to