On 7/4/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

I have an application that creates and writes to an output file I need
to process. I need to process the file when it is completely written
to. I do not initially know how big the file will be in the end.
Further, the application does NOT put a write lock on the file while
it is writing it. because of the buffering, the program wirtes to the
file in random chunks not continuously. And what is worse, the file
format itself could vary so there is nothing in the actual file that
signals the end of it. Everything is on a linux server.

What's the most efficient way of checking this? - one way is perhaps
inifinite loop checking mmtime until it is stable for a certain amount
of time?? I am not sure.

That's probably the right road to choose. Choose some time interval
that's long enough to be sure the file is done, but not so long that
it results in undue impatience in whoever is waiting for the end
results.

An alternative might be if there is some way to spy on the process
doing the writing. This requires new interactions between the two
programs and the OS, making everything more fragile. But if you can
determine that the other process has finished execution, or in any
other way has closed its equivalent of a filehandle, you can probably
be certain that the writing is finished. Probably.

Still, I'd prefer polling, using the longest interval I could justify.

Is there absolutely no clue available, though? For example, you speak
of buffering; if a file's size isn't a multiple of the buffer size,
does that mean that it's finished? It may be that your application
will be happiest with nearly all data at the earliest possible moment,
even though one file out of 8192 will be delayed by an extra hour to
be sure that it's really finished. But it would be bad, even fatal,
for some applications to get data out-of-order. (Or does it write just
one file at a time, so that you know the first is done when the
second is starting?)

One frill you could add would reduce impatience by increasing the
polling frequency. It would take a lot of trouble, and maybe cause a
lot of trouble, so it's probably not worth it. But you could set the
polling interval to be arbitrarily quick, adding some code that would
recognize when some file has been updated and announcing "This
supersedes file #42", or whatever. It all depends upon whether clients
will be more troubled by missing information that is slow to come in,
or incomplete information that is quick to be updated (although
possibly still incomplete).

One last alternative comes to mind: Determine the supplier of the
mystery application, and use any means necessary to have appropriate
file locking or equivalent behavior added to their source code. If
your clients are impatient, it's transitive: they're impatient with
this other software, really.

In the end, the road you take depends upon where your clients'
impatience drives you.

Good luck with it!

--Tom Phoenix
Stonehenge Perl Training

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to