Hi,

One of the things that the recent discussions of rule-based systems
applications didn't address is efficiency. Jess is built around the
Rete algorithm, which explicitly trades space for time - i.e., it uses
lots of memory, but under the right circumstances, is much faster than
a naive approach of nested if-then statements. The performance gains
can be astronomical - there are common cases where Jess's version of
Rete will take quadratic or better time, while the naive approach
takes O(N^6) time.

Now, what are the right circumstances? Well, one criteria is that the
fact-list changes slowly - i.e., less than ten percent of the facts
change with each rule firing. I'd modify this slightly to say that
in a problem which decomposes directly into N subproblems (as writing
payroll checks for N employees clearly does,) less than 10% of the
facts relevant to any given employee must change.

It might be that your proposed system doesn't meet this criterion,
because more or less ALL of the facts specific to each employee (the
"clocking events") will be replaced each time the system is run on a
given employee. ON the other hand, if there is a lot of data regarding
employee-specific (or general-case) deductions, rates, account
numbers, etc,  then the criterion might apply anyway.

Memory usage in a Rete-based system is a function not only of the
number of facts (I'd figure a kilobyte or so per fact, maximum, so
7*20,000 facts is about 140 MB worth of data) but also on the
complexity of the rules; join-network nodes can require quadratic
storage in the worst case. Now, since your fact-base will be so
strongly partitioned by employee, this is quadratic in the number of
events per employee (again, worst case) not in the number of
employees, so it's not too bad.

Come to think of it, this is really a wonderful parallel application;
you can break the larger problem into as many separate parallel
processes as necessary to ensure scaling. One good way to structure
the system, then, would be to use a regular RDBMS to hold the data and
do the day-to-day user interaction (to get transactions, UI,
persistence, failover) but then use Jess to do the actual weekly
processing. 

Regarding using two selects instead of a join: this isn't necessarily
a bad thing. You'd be asking the DB for two linearly downselected
quanta of data, and letting Jess do the join. Remember that efficient
joins are what Rete is all about.

Regarding Thread-safety: Jess is thread-safe. Note, however, that it
achieves this by using a small number (two) of mutexes; one on rule
LHS and one on RHS. Therefore only one thread can be asserting a fact
at a time.

I think [EMAIL PROTECTED] wrote:
> 
> 
> I've seen the recommendations on when to use a rule base system (RBS) (re.
> thread "JESS: Need of a Rule Based System!!!" started by Sachin and the thread
> "JESS: Data constraint management " started by Robert Quillen.)  I wonder,
> though, about when you have a REALLY big system.  I am now working on a project
> for a time and attendance system.  I can imagine it handling ten to twenty
> thousand employees.  (So that would be ten to twenty thousand facts.)  The rules
> to pay these individuals are very complex (on the order of hundreds of rules).
> Each employee would have two to six "clocking events" per day.  (That would
> potentially be six times twenty thousand more facts.)  There can be ten to a
> hundred different people adding/changing facts at any given time.
> 
> So I'd like to use an RBS, but can it handle it?  I can't imagine keeping all
> that data in memory.  So how efficient would the engine be at
> reading/representing this information in an SQL table?  I've seen Thomas
> Barkenow's RDBS extension.  It basically is wrapper for creating individual
> SELECT statements for each rule to find it's facts.  But I can't imagine that is
> very efficient.  It gets the job done, but it would be murder for a large
> system.   I can see a case where it would use two SELECT statements instead of
> using a JOIN.
> 
> How efficient are these RBS when there is a large number of facts being
> added/changed?  What would happen if power was suddenly lost.  Would anything
> get lost?  (That would be a bad thing, especially since we are dealing with
> people's pay!)  How would it recover?
> 
> In the case of JESS, how thread safe is it?
> 
> -Chuck.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, send the words 'unsubscribe jess-users [EMAIL PROTECTED]'
> in the BODY of a message to [EMAIL PROTECTED], NOT to the
> list (use your own address!) List problems? Notify [EMAIL PROTECTED]
> ---------------------------------------------------------------------
> 



---------------------------------------------------------
Ernest Friedman-Hill  
Distributed Systems Research        Phone: (925) 294-2154
Sandia National Labs                FAX:   (925) 294-2234
Org. 8920, MS 9012                  [EMAIL PROTECTED]
PO Box 969                  http://herzberg.ca.sandia.gov
Livermore, CA 94550
---------------------------------------------------------------------
To unsubscribe, send the words 'unsubscribe jess-users [EMAIL PROTECTED]'
in the BODY of a message to [EMAIL PROTECTED], NOT to the
list (use your own address!) List problems? Notify [EMAIL PROTECTED]
---------------------------------------------------------------------

Reply via email to