*Jacques,* * I had to review some of my notes to remember what we were trying to do with the JobSandbox. Here are my replies to your questions:
1. Did you use the purge-job-days setting in serviceengine.xml and the related purgeOldJobs? If not was there a reason? We were not using the purgeOldJobs service. This was probably because we didn’t understand how the service worked. We may have thought the service was specific to order only jobs which would not have worked for us. Our jobs are custom service jobs for the particular application we are developing. One problem that we had with most jobs that hit the JobSandbox (including the poller) was that it appeared they were doing full table scans instead of an indexed scan. These would cause problems for us when the JobSandbox grew larger and especially during heavy production days. We would often see transaction locks on the JobSandbox and I/O bottlenecks on the server in general due to the scans. The purgeOldJobs service may be a good solution for that if we could keep the JobSandbox to a reasonable number of records. I created issue OFBIZ-3855 on this a couple of years ago when we tried to use the JobSandbox as a batch process service for multiple application servers. We were filling up the JobSandbox with 100k of records over a short period of time. The poller was getting transaction timeouts before it could change the status of the next available job to process. I created a patch to allow a user to customize the transaction timeout for the poller. I thought I had submitted this patch but looking at the Jira issue it doesn’t look like it was every submitted. In the end we changed how we did our data warehouse processing. Increasing the transaction timeout didn’t really solve the problem either it just made it possible to extend the timeout length which can have other consequences in the system. If the community is still interested in the patch I can submit it to Jira for a recent version from the trunk. 2. Configuring service engine to run with multiple job pools. As I’m looking at my notes I believe the problem with configuring the service engine with multiple job pools was that there wasn’t an API to run a service (async or synchronous) to a specific job service pool. You could schedule a job to run against a particular pool. For example in the serviceengine.xml file you can configure a job to run in a particular job pool like the following: <startup-service name="testScv" runtime-data-id="9900" runtime-delay="0" run-in-pool="pool"/> You can also use the LocalDispatcher.schedule() method to schedule a job to run in a particular pool. What we needed was a way to configure our app servers to service different service pools but allow all app servers to request the service dynamically. This would allow us to limit the number of concurrent services that were run in our system. The default system engine lets all the app servers service the jobSandbox which doesn’t scale well for us during heavy production days. This is one of the reasons we liked the idea of a JMS integration with the service engine. Then we could start up processes to listen to a specific queues and our application could write to the different queues. This would allow us to control the amount of concurrent services processed at a time. Let me know if you need any more information. Thanks, Brett* On Mon, Aug 6, 2012 at 1:10 AM, Jacques Le Roux < jacques.le.r...@les7arts.com> wrote: > Hi Brett, > > From: "Brett Palmer" <brettgpal...@gmail.com> > > Adrian, >> >> Thanks for the update. Here are some feedback points on your listed >> items: >> >> 1. JobPoller get out-of-memor error. We've seen this a lot in production >> servers when the JobSandbox table is not constantly pruned of old records. >> It would be nice if the poller restricted its search for only active >> records it could process. >> > > Did you use the purge-job-days setting in serviceengine.xml and the > related purgeOldJobs? If not was there a reason? > > > 2. Queue for capturing missing records would be good. From item 1 above >> we >> have had locks on table when the poller is busy doing a scan and new jobs >> cannot be added or time out. >> > > +1 > > > Other wish items: >> >> - Ability to assign different service engines to process specific job >> types. We often multiple application servers but want to limit how many >> concurrent jobs are run. For example, if I had 4 app servers connected to >> the same DB I may only want one app server to service particular jobs. I >> thought this feature was possible but when I tried to implement it by >> changing some of the configuration files it never worked correctly. >> > > Las time I used this it was with R4.0 and it worked, which problems did > you cross exactly (if you remember) ? > > Thanks > > > - JMS support for the service engine. It would be nice if there was a JMS >> interface for those that want to use JMS as their queuing mechanism for >> jobs. >> > > +1 > > Jacques > > > >> Brett >> >> On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum < >> adrian.crum@sandglass-**software.com <adrian.c...@sandglass-software.com>> >> wrote: >> >> On 8/5/2012 11:02 AM, Adrian Crum wrote: >>> >>> I just committed a bunch of changes to the Job Manager group of classes. >>>> The changes help simplify the code and hopefully make the Job Manager >>>> more >>>> robust. On the other hand, I might have broken something. ;) I will >>>> monitor >>>> the mailing list for problems. >>>> >>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool> >>>> element) should be changed. I think min-threads should be set to "2" and >>>> max-threads should be set to "5". Creating lots of threads can hurt >>>> throughput because the JVM spends more time managing them. I would be >>>> interested in hearing what others think. >>>> >>>> >>> Thinking about this more, there are some other things that need to be >>> fixed: >>> >>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the >>> potential the queue will grow in size until it causes an out-of-memory >>> condition. >>> 2. There is no accommodation for when a job cannot be added to the queue >>> - >>> it is just lost. We could add a dequeue method to the Job interface that >>> will allow implementations to recover or reschedule the job when it can't >>> be added to the queue. >>> 3. There is a JobPoller instance per delegator, and each instance >>> contains >>> the number of threads configured in serviceengine.xml. With the current >>> max-threads setting of 15, a multi-tenant installation with 100 tenants >>> will create up to 1500 threads. (!!!) A smarter strategy might be to >>> have a >>> single JobPoller instance that services multiple JobManagers. >>> >>> -Adrian >>> >>> >>> >>> >>