On 7/4/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
I have an application that creates and writes to an output file I need to process. I need to process the file when it is completely written to. I do not initially know how big the file will be in the end. Further, the application does NOT put a write lock on the file while it is writing it. because of the buffering, the program wirtes to the file in random chunks not continuously. And what is worse, the file format itself could vary so there is nothing in the actual file that signals the end of it. Everything is on a linux server. What's the most efficient way of checking this? - one way is perhaps inifinite loop checking mmtime until it is stable for a certain amount of time?? I am not sure.
That's probably the right road to choose. Choose some time interval that's long enough to be sure the file is done, but not so long that it results in undue impatience in whoever is waiting for the end results. An alternative might be if there is some way to spy on the process doing the writing. This requires new interactions between the two programs and the OS, making everything more fragile. But if you can determine that the other process has finished execution, or in any other way has closed its equivalent of a filehandle, you can probably be certain that the writing is finished. Probably. Still, I'd prefer polling, using the longest interval I could justify. Is there absolutely no clue available, though? For example, you speak of buffering; if a file's size isn't a multiple of the buffer size, does that mean that it's finished? It may be that your application will be happiest with nearly all data at the earliest possible moment, even though one file out of 8192 will be delayed by an extra hour to be sure that it's really finished. But it would be bad, even fatal, for some applications to get data out-of-order. (Or does it write just one file at a time, so that you know the first is done when the second is starting?) One frill you could add would reduce impatience by increasing the polling frequency. It would take a lot of trouble, and maybe cause a lot of trouble, so it's probably not worth it. But you could set the polling interval to be arbitrarily quick, adding some code that would recognize when some file has been updated and announcing "This supersedes file #42", or whatever. It all depends upon whether clients will be more troubled by missing information that is slow to come in, or incomplete information that is quick to be updated (although possibly still incomplete). One last alternative comes to mind: Determine the supplier of the mystery application, and use any means necessary to have appropriate file locking or equivalent behavior added to their source code. If your clients are impatient, it's transitive: they're impatient with this other software, really. In the end, the road you take depends upon where your clients' impatience drives you. Good luck with it! --Tom Phoenix Stonehenge Perl Training -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/