Hi Folks,

We've got a batch program in Java (we had to make a few mods to the Java interface that we'll have patches for shortly). The program does bib overlay/merging. Essentially, it just reads a MARC file and then makes REQUEST calls to OpenSRF to do the XML update and bib merges.

The trouble with it is, it does a couple of things that we do not like so much:

1) It dies periodically (arrgh) because the OpenSRF/Open-ILS indexing/metabib update stuff can get ahead of the database and so the cstore calls can time out when the busy DB doesn't respond rapidly enough; and
2) It can really hammer the DB; so that although the program could run while the system is in use (based on it's touching a record at a time); as it stands, it tends to hog all the cycles on the DB host.

Now, we've investigated some options for fixing this:

Option 1): Add a configurable wait (the program just pauses for n milliseconds every m records).
It takes some tuning to get this to the point that it doesn't overload the DB so each run is a kind of trial and error situation. But it has the advantage of being dead simple.

Option 2): Retry when there's a cstore error.
This results in a somewhat unexpected situation in which, after a certain number of errors, the Evergreen login becomes invalid and so you can't even restart the program using the same login: inconvenient (this lock-out may be on a timer, but we've just resorted to restarting all processes to clear it). The existence of such a feature makes some sense, but it wasn't a result we'd anticipated. So far, we haven't found where this lock-out could be adjusted (and we don't want to have to install a custom version of OpenSRF or Open-ILS to prevent the errors leading to the login lock-out). And, it's not much help with the hammering-the-DB issue. So, this isn't sounding like a terribly good option.

So, now to the question: should we use some other technique to ensure that we're not spawning c-store tasks to handle each record, but are using a single process synchronously? And, if that's the way to go, would that be a stateful connection? (Not currently supported by the Java API, it would seem, but we could sure add it.) Or, perhaps more elegantly, is there a way to allow some number of asynchronous processes to do the DB updates, but somehow limit it to a given number?

Thanks for any recommendations.

John




Reply via email to