On 04/02/2015 07:43 PM, Saran A wrote:
   <snip>

I debugged and rewrote everything. Here is the full version. Feel free to tear 
this apart. The homework assignment is not due until tomorrow, so I am 
currently also experimenting with pyinotify as well. I do have questions 
regarding how to make this function compatible with the ProcessEvent Class. I 
will create another post for this.

What would you advise in regards to renaming the inaptly named dirlist?

Asked and answered. You called it path at one point. dir_name would also be good. The point is that if you have a good name, you're less likely to have unreasonable code processing that name. For example,


# # # Without data to examine here, I can only guess based on this 
requirement's language that
# # fixed records are in the input.  If so, here's a slight revision to the 
helper functions that I wrote earlier which
# # takes the function fileinfo as a starting point and demonstrates calling a 
function from within a function.
# I tested this little sample on a small set of files created with MD5 
checksums.  I wrote the Python in such a way as it
# would work with Python 2.x or 3.x (note the __future__ at the top).

# # # There are so many wonderful ways of failure, so, from a development 
standpoint, I would probably spend a bit
# # more time trying to determine which failure(s) I would want to report to 
the user, and how (perhaps creating my own Exceptions)

# # # The only other comments I would make are about safe-file handling.

# # #   #1:  Question: After a user has created a file that has failed (in
# # #        processing),can the user create a file with the same name?
# # #        If so, then you will probably want to look at some sort
# # #        of file-naming strategy to avoid overwriting evidence of
# # #        earlier failures.

# # # File naming is a tricky thing.  I referenced the tempfile module [1] and 
the Maildir naming scheme to see two different
# # types of solutions to the problem of choosing a unique filename.
## I am assuming that all of my files are going to be specified in unicode

## Utilized Spyder's Scientific Computing IDE to debug, check for indentation 
errors and test function suite

from __future__ import print_function

import os.path
import time
import logging


def initialize_logger(output_dir):
           <snip>

I didn't ever bother to read the body of this function, since a simple

      print(mydata, file=mylog_file)

will suffice to add data to the chosen text file. Why is it constantly getting more complex, to solve a problem that was simple in the beginning?



#Returns filename, rootdir and filesize

def fileinfo(f):
     filename = os.path.basename(f)
     rootdir = os.path.dirname(f)
     filesize = os.path.getsize(f)
     return filename, rootdir, filesize

#returns length of file
def file_len(f):
     with open(f) as f:
         for i, l in enumerate(f):
             pass
             return i + 1

Always returns 1 or None. Check the indentation, and test to see what it does for empty file, for a file with one line, and for a file with more than one line.


#attempts to copy file and move file to it's directory
def copy_and_move_file(src, dest):

Which is it, are you trying to copy it, or move it? Pick one and make a good function name that shows your choice.

     try:
         os.rename(src, dest)

Why are you using rename, when you're trying to move the file? Take a closer look at shutil, and see if it has a function that does it safer than rename. The function you need uses rename, when it'll work, and does it other ways when rename will not.

         # eg. src and dest are the same file
     except IOError as e:
         print('Error: %s' % e.strerror)

A try/except that doesn't try to correct the problem is not generally useful. Figure out what could be triggering the exception, and how you're going to handle it. If it cannot be handled, terminate the program.

For example, what if you don't have permissions to modify one of the specified directories? You can't get any useful work done, so you should notify the user and exit. An alternative is to produce 50 thousand reports of each file you've got, telling how it succeeded or failed, over and over.


path = "."
dirlist = os.listdir(path)

def main(dirlist):
     before = dict([(f, 0) for f in dirlist])

Since dirlist is a path, it's a string. So you're looping through the characters of the name of the path. I still don't have a clue what the dict is supposed to mean here.

     while True:
         time.sleep(1) #time between update check

That loop goes forever, so the following code will never run.

     after = dict([(f, None) for f in dirlist])

Once again, you're looping through the letters of the directory name. Or if dirlist is really a list, and you're deciding that's what it should be, then of course after will be identical to before.

     added = [f for f in after if not f in before]
     if added:
         f = ''.join(added)
         print('Sucessfully added %s file - ready to validate') %()
         return validate_files(f)
     else:
         return move_to_failure_folder_and_return_error_file(f)



def validate_files(f):

You completely changed the scope of this function. Why does it still have an s in its name?

     creation = time.ctime(os.path.getctime(f))
     lastmod = time.ctime(os.path.getmtime(f))

What do timestamps have to do with anything?

     if creation == lastmod and file_len(f) > 0:
         return move_to_success_folder_and_read(f)

How have you decided that the records are all the same length?

     if file_len < 0 and creation != lastmod:

file_len cannot be safely compared to 0, it's a function. If this were Python 3, it would give you a runtime error for trying. Even if you had successfully called the function here, how is it possible for it to return a negative value?

         return move_to_success_folder_and_read(f)
     else:
         return move_to_failure_folder_and_return_error_file(f)


#Potential Additions/Substitutions

def move_to_failure_folder_and_return_error_file():
     filename, rootdir, lastmod, creation, filesize = fileinfo(file)

fileinfo() returns 3 things, and you're stuffing them in 5 variables.

     os.mkdir('Failure')

What do you do the second time through this function, when that directory is already existing?

     copy_and_move_file( 'Failure')

The function takes two arguments, neither of which is likely to be that string.

     initialize_logger('rootdir/Failure')
     logging.error("Either this file is empty or there are no lines")



def move_to_success_folder_and_read():
     filename, rootdir, lastmod, creation, filesize = fileinfo(file)
     os.mkdir('Success')
     copy_and_move_file(rootdir, 'Success') #file name
     print("Success", file)
     return file_len(file)



if __name__ == '__main__':
    main(dirlist)

So now you're passing a list of filenames to main()? How then is it going to know when new files arrive? You were originally going to tell it a directory name. That's why I suggested that you call the parameter 'path' or 'dirname'.



--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to