Hi Bruce,

Excuse me if I'm a little blunt below. I'm ill grumpy...

bruce wrote:
hi nigel...

using any kind of file locking process requires that i essentially have a
gatekeeper, allowing a single process to enter, access the files at a
time...

I don't beleive this is a necessary condition. That would only be the case if you allowed yourself a single lock.

i can easily setup a file read/write lock process where a client app
gets/locks a file, and then copies/moves the required files from the initial
dir to a tmp dir. after the move/copy, the lock is released, and the client
can go ahead and do whatever with the files in the tmp dir.. thie process
allows multiple clients to operate in a psuedo parallel manner...

i'm trying to figure out if there's a much better/faster approach that might
be available.. which is where the academic/research issue was raised..

I'm really not sure why you want to move the files around. Here are two different approaches from the one I initially gave you that deals perfectly well with a directory where files are constantly being added.

In both approaches we are going to try and avoid using OS-specific locking mechanisms, advisory locking, flock etc. So it should work everywhere as long as you also have write access to the filesystem you're on.


Approach 1 - Constant Number of Processes

This requires no central manager but for every file lock requires a few OS calls.

Start up N processes with the same working directory WORK_DIR.

Each process then follows this algorithm:

- sleep for some small random period.

- scan the WORK_DIR for a FILE that does not have a corresponding LOCK_FILE

- open LOCK_FILE in append mode and write our PID into it.

- close LOCK_FILE

- open LOCK_FILE

- read first line from LOCK_FILE and compare to our PID

- if the PID we just read from the LOCK_FILE matches ours then we may process the corresponding FILE otherwise another process beat us to it.

- repeat

After processing a file completely you can remove it and the lockfile at the same time.

As long as filenames follow some pattern then you can simply say that the LOCK_FILE for FILE is called FILE.lock

e.g.

WORK_DIR  : /home/wiggly/var/work
FILE      : /home/wiggly/var/work/data_2354272.dat
LOCK_FILE : /home/wiggly/var/work/data_2354272.dat.lock


Approach 2 - Managed Processes

Here we have a single main process that spawns children. The children listen for filenames on a pipe that the parent has open to them.

The parent constantly scans the WORK_DIR for new files to process and as it finds one it sends that filename to a child process.

You can either be clever about the children and ensure they tell the parent when they're free or just pass them work in a round-robin fashion.

I hope the two above descriptions make sense, let me know if they don't.

   n


the issue that i'm looking at is analogous to a FIFO, where i have lots of
files being shoved in a dir from different processes.. on the other end, i
want to allow mutiple client processes to access unique groups of these
files as fast as possible.. access being fetch/gather/process/delete the
files. each file is only handled by a single client process.

thanks..



-----Original Message-----
From: python-list-bounces+bedouglas=earthlink....@python.org
[mailto:python-list-bounces+bedouglas=earthlink....@python.org]on Behalf
Of Nigel Rantor
Sent: Sunday, March 01, 2009 2:00 AM
To: koranthala
Cc: python-list@python.org
Subject: Re: file locking...


koranthala wrote:
On Mar 1, 2:28 pm, Nigel Rantor <wig...@wiggly.org> wrote:
bruce wrote:
Hi.
Got a bit of a question/issue that I'm trying to resolve. I'm asking
this of a few groups so bear with me.
I'm considering a situation where I have multiple processes running,
and each process is going to access a number of files in a dir. Each
process accesses a unique group of files, and then writes the group
of files to another dir. I can easily handle this by using a form of
locking, where I have the processes lock/read a file and only access
the group of files in the dir based on the  open/free status of the
lockfile.
However, the issue with the approach is that it's somewhat
synchronous. I'm looking for something that might be more
asynchronous/parallel, in that I'd like to have multiple processes
each access a unique group of files from the given dir as fast as
possible.
I don't see how this is synchronous if you have a lock per file. Perhaps
you've missed something out of your description of your problem.

So.. Any thoughts/pointers/comments would be greatly appreciated. Any
 pointers to academic research, etc.. would be useful.
I'm not sure you need academic papers here.

One trivial solution to this problem is to have a single process
determine the complete set of files that require processing then fork
off children, each with a different set of files to process.

The parent then just waits for them to finish and does any
post-processing required.

A more concrete problem statement may of course change the solution...

   n
Using twisted might also be helpful.
Then you can avoid the problems associated with threading too.

No one mentioned threads.

I can't see how Twisted in this instance isn't like using a sledgehammer
to crack a nut.

   n
--
http://mail.python.org/mailman/listinfo/python-list


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to