Hi all! I'm implementing one of my first multithreaded apps, and have gotten to a point where I think I'm going off track from a standard idiom. Wondering if anyone can point me in the right direction.
The script will run as a daemon and watch a given directory for new files. Once it determines that a file has finished moving into the watch folder, it will kick off a process on one of the files. Several of these could be running at any given time up to a max number of threads. Here's how I have it designed so far. The main thread starts a Watch(threading.Thread) class that loops and searches a directory for files. It has been passed a Queue.Queue() object (watch_queue), and as it finds new files in the watch folder, it adds the file name to the queue. The main thread then grabs an item off the watch_queue, and kicks off processing on that file using another class Worker(threading.thread). My problem is with communicating between the threads as to which files are currently processing, or are already present in the watch_queue so that the Watch thread does not continuously add unneeded files to the watch_queue to be processed. For example...Watch() finds a file to be processed and adds it to the queue. The main thread sees the file on the queue and pops it off and begins processing. Now the file has been removed from the watch_queue, and Watch() thread has no way of knowing that the other Worker() thread is processing it, and shouldn't pick it up again. So it will see the file as new and add it to the queue again. PS.. The file is deleted from the watch folder after it has finished processing, so that's how i'll know which files to process in the long term. I made definite progress by creating two queues...watch_queue and processing_queue, and then used lists within the classes to store the state of which files are processing/watched. I think I could pull it off, but it has got very confusing quickly, trying to keep each thread's list and the queue always in sync with one another. The easiset solution I can see is if my threads could read an item from the queue without removing it from the queue and only remove it when I tell it to. Then the Watch() thread could then just follow what items are on the watch_queue to know what files to add, and then the Worker() thread could intentionally remove the item from the watch_queue once it has finished processing it. Now that I'm writing this out, I see a solution by over-riding or wrapping Queue.Queue().get() to give me the behavior I mention above. I've noticed .join() and .task_done(), but I'm not sure of how to use them properly. Any suggestions would be greatly appreciated. ~Sean -- http://mail.python.org/mailman/listinfo/python-list