Thanks Brett! Your feedback and Jira issue will help a lot.
-Adrian
On 8/6/2012 10:52 PM, Brett Palmer wrote:
*Jacques,*
*
I had to review some of my notes to remember what we were trying to do with
the JobSandbox. Here are my replies to your questions:
1. Did you use the purge-job-days setting in serviceengine.xml and the
related purgeOldJobs? If not was there a reason?
We were not using the purgeOldJobs service. This was probably because we
didn’t understand how the service worked. We may have thought the service
was specific to order only jobs which would not have worked for us. Our
jobs are custom service jobs for the particular application we are
developing.
One problem that we had with most jobs that hit the JobSandbox (including
the poller) was that it appeared they were doing full table scans instead
of an indexed scan. These would cause problems for us when the JobSandbox
grew larger and especially during heavy production days. We would often
see transaction locks on the JobSandbox and I/O bottlenecks on the server
in general due to the scans. The purgeOldJobs service may be a good
solution for that if we could keep the JobSandbox to a reasonable number of
records.
I created issue OFBIZ-3855 on this a couple of years ago when we tried to
use the JobSandbox as a batch process service for multiple application
servers. We were filling up the JobSandbox with 100k of records over a
short period of time. The poller was getting transaction timeouts before
it could change the status of the next available job to process. I created
a patch to allow a user to customize the transaction timeout for the
poller. I thought I had submitted this patch but looking at the Jira issue
it doesn’t look like it was every submitted.
In the end we changed how we did our data warehouse processing. Increasing
the transaction timeout didn’t really solve the problem either it just made
it possible to extend the timeout length which can have other consequences
in the system.
If the community is still interested in the patch I can submit it to Jira
for a recent version from the trunk.
2. Configuring service engine to run with multiple job pools.
As I’m looking at my notes I believe the problem with configuring the
service engine with multiple job pools was that there wasn’t an API to run
a service (async or synchronous) to a specific job service pool. You could
schedule a job to run against a particular pool.
For example in the serviceengine.xml file you can configure a job to run in
a particular job pool like the following:
<startup-service name="testScv" runtime-data-id="9900"
runtime-delay="0"
run-in-pool="pool"/>
You can also use the LocalDispatcher.schedule() method to schedule a job to
run in a particular pool.
What we needed was a way to configure our app servers to service different
service pools but allow all app servers to request the service dynamically.
This would allow us to limit the number of concurrent services that were
run in our system. The default system engine lets all the app servers
service the jobSandbox which doesn’t scale well for us during heavy
production days.
This is one of the reasons we liked the idea of a JMS integration with the
service engine. Then we could start up processes to listen to a specific
queues and our application could write to the different queues. This would
allow us to control the amount of concurrent services processed at a time.
Let me know if you need any more information.
Thanks,
Brett*
On Mon, Aug 6, 2012 at 1:10 AM, Jacques Le Roux <
jacques.le.r...@les7arts.com> wrote:
Hi Brett,
From: "Brett Palmer" <brettgpal...@gmail.com>
Adrian,
Thanks for the update. Here are some feedback points on your listed
items:
1. JobPoller get out-of-memor error. We've seen this a lot in production
servers when the JobSandbox table is not constantly pruned of old records.
It would be nice if the poller restricted its search for only active
records it could process.
Did you use the purge-job-days setting in serviceengine.xml and the
related purgeOldJobs? If not was there a reason?
2. Queue for capturing missing records would be good. From item 1 above
we
have had locks on table when the poller is busy doing a scan and new jobs
cannot be added or time out.
+1
Other wish items:
- Ability to assign different service engines to process specific job
types. We often multiple application servers but want to limit how many
concurrent jobs are run. For example, if I had 4 app servers connected to
the same DB I may only want one app server to service particular jobs. I
thought this feature was possible but when I tried to implement it by
changing some of the configuration files it never worked correctly.
Las time I used this it was with R4.0 and it worked, which problems did
you cross exactly (if you remember) ?
Thanks
- JMS support for the service engine. It would be nice if there was a JMS
interface for those that want to use JMS as their queuing mechanism for
jobs.
+1
Jacques
Brett
On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum <
adrian.crum@sandglass-**software.com <adrian.c...@sandglass-software.com>>
wrote:
On 8/5/2012 11:02 AM, Adrian Crum wrote:
I just committed a bunch of changes to the Job Manager group of classes.
The changes help simplify the code and hopefully make the Job Manager
more
robust. On the other hand, I might have broken something. ;) I will
monitor
the mailing list for problems.
I believe the JobPoller settings in serviceengine.xml (the <thread-pool>
element) should be changed. I think min-threads should be set to "2" and
max-threads should be set to "5". Creating lots of threads can hurt
throughput because the JVM spends more time managing them. I would be
interested in hearing what others think.
Thinking about this more, there are some other things that need to be
fixed:
1. The JobPoller uses an unbounded queue. In a busy server, there is the
potential the queue will grow in size until it causes an out-of-memory
condition.
2. There is no accommodation for when a job cannot be added to the queue
-
it is just lost. We could add a dequeue method to the Job interface that
will allow implementations to recover or reschedule the job when it can't
be added to the queue.
3. There is a JobPoller instance per delegator, and each instance
contains
the number of threads configured in serviceengine.xml. With the current
max-threads setting of 15, a multi-tenant installation with 100 tenants
will create up to 1500 threads. (!!!) A smarter strategy might be to
have a
single JobPoller instance that services multiple JobManagers.
-Adrian