On Tue, 21 Dec 2010 14:17:40 -0500, Matty Sarro wrote: > Hey everyone. > I'm in the midst of writing a parser to clean up incoming files, remove > extra data that isn't needed, normalize some values, etc. The base files > will be uploaded via FTP. > How does one go about scanning a directory for new files? For now we're > looking to run it as a cron job but eventually would like to move away > from that into making it a service running in the background. > Make sure the files are initially uploaded using a name that the parser isn't looking for and rename it when the upload is finished. This way the parser won't try to process a partially loaded file.
If you are uploading to a *nix machine You the rename can move the file between directories provided both directories are in the same filing system. Under those conditions rename is always an atomic operation with no copying involved. This would you to, say, upload the file to "temp/ myfile" and renamed it to "uploaded/myfile" with your parser only scanning the uploaded directory and, presumably, renaming processed files to move them to a third directory ready for further processing. I've used this technique reliably with files arriving via FTP at quite high rates. -- martin@ | Martin Gregorie gregorie. | Essex, UK org | -- http://mail.python.org/mailman/listinfo/python-list