I have a script that runs as a cron job every minute (on Ubuntu 10.10 and R
2.11.1), querying a database for new data. Most of the time it takes a few
seconds to run, but once in while it takes more than a minute and the next run
starts (on the same data) before the previous one has finished. In extreme
cases this will fill up memory with a large number of runs of the same script
on the same data. My 'solution' has been to create a process id file with the
currently running script, first checking whether there is another process id
file and whether that process is still running. I use the following code:
pid <- max(system("pgrep -x R", intern = TRUE))
if (file.exists("/var/run/myscript.pid")) {
rm(pid)
pid <- read.table("/var/run/myscript.pid")[[1]]
if (length(system(paste("ps -p", pid), intern = TRUE)) != 2) {
stop("Myscript is already running in another process.")
} else {
pid <- max(system("pgrep -x R", intern = TRUE))
write(pid, "/var/run/myscript.pid")
}
} else {
write(pid, "/var/run/myscript.pid")
}
....my script .....
file.remove("/var/run/myscript.pid")
#The End
The trouble here is that I also have other R scripts running on the same
system, so while max(system("pgrep -x R", intern = TRUE)) will almost always
give me the right pid, it is not guaranteed to work. There are two situations
where it could fail: when the process id numbers round 32000 and start over
again, and if another process starts up at the same time, the process ids could
get swapped.
Is there a way to query for the process id of the specific R script, rather
than all R processes?
Mikkel
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.