Re: [rules-users] fact granularity, performance, and other questions

2009-05-27 Thread Wolfgang Laun
I also think that the approach proposed by Greg of having a Message fact
type and various Segment fact types would let you write efficient and
maintainable rules. (The extreme splitting into Field facts has several
drawbacks, as has been pointed out, and also: 'value' is always a string,
so you lose type checking.) One thing I could not see is the effect of
the structural layer of transactions. And another one: are there several
(sub)types of Segment? If so, it will be possible of writing rules against
(abstract) base types, e.g.

abstract class Segment
class DebitSeg extends Segment
class CreditSeg extends Segment

rule notPositive
when
 $s : Segment( amount <= 0 )
then
weird( $s );

Also, Segment subtypes might implement interfaces, and its
also possible to use them as fact names in patterns.

-W


On Thu, May 28, 2009 at 2:17 AM, David Zeigler  wrote:

> Hi,
> I could use some experienced guidance.  I'm in the process of
> evaluating Drools for use of using in a real-time transactional
> environment to process about 3000 messages/second.  I realize a lot of
> this depends on the type and quantity of rules, hardware, etc.  I'm
> curious what steps others have taken to improve performance and if
> there are any recommendations for my case detailed below.
>
> A few specific questions I have are:
>  - Should each field in a message be a fact? (more info on my message
> below)  What fact granularity have you settled on in your usage and
> why?
>  - Does the order of the conditions in a rule affect performance, the
> execution order, or the structure of the Rete network?
>  - Does the order the facts are inserted into a stateless session (as
> a list via the CommandFactory.newInsertElements) affect performance at
> all?
>
> The message is an EDI format and will typically have anywhere from 80
> to 200 fields, potentially more.  The message is divided into
> transactions, then segments, then fields.  We have an object model
> that represents the message.  We're using a stateless session and most
> of the rules will modify fields or add fields and segments based on
> the values of other fields.  Currently, I flatten the object model
> into a List containing the message, transactions, segments, and each
> field, and then insert the list into the stateless session and fire.
> I'm avoiding sequential mode for now until we have a better idea of
> our requirements.
>
> Here's a simplified example of what I'm doing now (using json-esque
> syntax instead of the EDI format).
> message {
>  segment {
>SG:header
>A0:agents
>  }
>  segment {
>SG:agent
>A1:000
>A2:JAMES
>A3:BOND
>  }
>  segment {
>SG:agent
>A1:86
>A2:MAXWELL
>A3:SMART
>  }
> }
>
> Each field has an id and a value.  For this message, I would insert 14
> facts:  the message object, 3 segments objects, and 10 field objects.
>
> rule "set James Bond's A1 to 007"
> when
>  $a1 : Field(id == "A1",  value != "007")
>  Field(id == "A2",  value == "JAMES")
>  Field(id == "A3",  value == "BOND")
> then
>  $a1.setValue("007");
>  update($a1);
> end
>
> A relatively more complicated, but typical rule would be "set James
> Bond's A1 to 007 iff he's the second agent in the message and one of
> the other agents' first names does not contain AUSTIN and an agency
> segment exists"
>
> The above example assumes each segment and field is a fact and I think
> it's a clean and flexible approach, but I'm concerned about the
> overhead of inserting potentially 250 facts for each message.  The
> only other alternatives I can think of seem to have their own set of
> problems:
> 1. limit the fields that rules can be written against to a limited
> subset, which may not be feasible depending on how the requirements
> evolve, and only assert those as facts.  Doing this seems to double
> the number of transactions per second in my nonscientific benchmark.
> 2. insert the Message object as a single fact and then write a slew of
> accessor methods in that object to get at all possible fields in the
> tree: getA1FromSecondAgentSegmentInFirstTransaction().  This seems
> like it might perform well, but could be very messy.
> 3. provide an api in the message object model to find various
> occurrences of fields in the messages, then use eval() in the rule.
> like eval(msg.findSegment("agent", secondOccurrence).getField("A1")).
> I've read that would be less efficient once the ruleset grows.
>
> I'm sure many of you have dealt with this type scenario before, what
> did you determine the best approach to be?
>
> Thanks,
> David
> ___
> rules-users mailing list
> rules-users@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-users
>
___
rules-users mailing list
rules-users@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-users


Re: [rules-users] fact granularity, performance, and other questions

2009-05-27 Thread Greg Barton

To address some points in order:

Yes, having each field in a message as a fact is the most straightforward.  The 
catch is that the rejoining conditions should be as fast as possible.  I'm 
thinking this structure would be best:

class Message {
  ...message level properties...
}

class Segment {
  public Message message;
  public int position;
  ...Segment level properties
}

Then the rules would be like this:

when
  m: Message(...foo...);
  s0: Segment(position == 0, message == m, ...bar...)
  s1: Segment(position == 1, message == m, ...bas...)
then
...
end

That's about as fast as you'll get with this setup, I figure.  And you're 
right, it is important to list the conditions in a certain order: with the 
"position == n" condition before the "message == m" condition, MANY Segments in 
working memory are eliminated from consideration before getting to the more 
expensive cross object join. (Even though the cross object condition is about 
as cheap as you can get, a straight == check.)

I'm not sure about the answer to the fact insertion order question vis a vis 
performance in a stateless session.  My guess is no effect, but probably a 
question for the devs. (Or a bit 'o' experimentation.)

It would be nice if your item #2 could be accomplished with a class property 
that's a multidimensional String array, though it throws an array out of bounds 
exception if you try testing for an array position that's too high.  Another 
way (which I think is what you suggest below) would be to have a one time 
generated class to with the following structure:

class Fact {
  String[][][] facts;
  
  public String geta0s0t0() {
if(facts.length >=1 && facts[0].length >= 1 facts[0][0].length >= 0) {
  return facts[0][0][0];
} else {
  return "";
}
  }
}

Initialize each one with a String[][][] and you're on your way.  This would 
absolutely be faster than the first method above, probably by several orders of 
magnitude.  

Another approach would be to see how hard it would be to alter mvel so it 
evaluates the array[out_of_bounds_index] expressions to false instead of 
throwing an exception.  Maybe for the long term...  

--- On Wed, 5/27/09, David Zeigler  wrote:

> From: David Zeigler 
> Subject: [rules-users] fact granularity, performance, and other questions
> To: "Rules Users List" 
> Date: Wednesday, May 27, 2009, 7:17 PM
> Hi,
> I could use some experienced guidance.  I'm in the
> process of
> evaluating Drools for use of using in a real-time
> transactional
> environment to process about 3000 messages/second.  I
> realize a lot of
> this depends on the type and quantity of rules, hardware,
> etc.  I'm
> curious what steps others have taken to improve performance
> and if
> there are any recommendations for my case detailed below.
> 
> A few specific questions I have are:
>  - Should each field in a message be a fact? (more info on
> my message
> below)  What fact granularity have you settled on in
> your usage and
> why?
>  - Does the order of the conditions in a rule affect
> performance, the
> execution order, or the structure of the Rete network?
>  - Does the order the facts are inserted into a stateless
> session (as
> a list via the CommandFactory.newInsertElements) affect
> performance at
> all?
> 
> The message is an EDI format and will typically have
> anywhere from 80
> to 200 fields, potentially more.  The message is
> divided into
> transactions, then segments, then fields.  We have an
> object model
> that represents the message.  We're using a stateless
> session and most
> of the rules will modify fields or add fields and segments
> based on
> the values of other fields.  Currently, I flatten the
> object model
> into a List containing the message, transactions, segments,
> and each
> field, and then insert the list into the stateless session
> and fire.
> I'm avoiding sequential mode for now until we have a better
> idea of
> our requirements.
> 
> Here's a simplified example of what I'm doing now (using
> json-esque
> syntax instead of the EDI format).
> message {
>   segment {
>     SG:header
>     A0:agents
>   }
>   segment {
>     SG:agent
>     A1:000
>     A2:JAMES
>     A3:BOND
>   }
>   segment {
>     SG:agent
>     A1:86
>     A2:MAXWELL
>     A3:SMART
>   }
> }
> 
> Each field has an id and a value.  For this message, I
> would insert 14
> facts:  the message object, 3 segments objects, and 10
> field objects.
> 
> rule "set James Bond's A1 to 007"
> when
>   $a1 : Field(id == "A1",  value != "007")
>   Field(id == "A2",  value == "J

Re: [rules-users] fact granularity, performance, and other questions

2009-05-27 Thread Ingomar Otter

David,
>- Does the order of the conditions in a rule affect performance, the
>execution order, or the structure of the Rete network?
Yes, quite a lot has been written about this. The drools documentation  
has a few paragraphs with good heuristics...
 A popular read seems to be this paper, which provides good insight  
into RETE performance aspects:


"Production Matching for Large Learning Systems (Rete/UL)"
by Robert B. Doorenbos
PhD thesis, Carnegie Mellon University, January 31, 1995



>- Does the order the facts are inserted into a stateless session (as
> a list via the CommandFactory.newInsertElements) affect performance  
at

>all?
All I can say that this is true (it has impact) for statefull  
sessions. The deal here is how often the agenda may get changed while  
inserting objects.

I have no experience with stateless sessions and I hope others do;-)

>- Should each field in a message be a fact? (more info on my message
>below)  What fact granularity have you settled on in your usage and
>why?
I would suggest to start with a model which is most natural to your  
rules (with the lowest WTF factor) and take it from there.


I assume that you need to correlate information from all segments of a  
message?
I would stay well clear from your option 3 (eval()) - just because I  
always do ;-)
Eval() simply defies RETE optimizations so it can be very sensitive to  
both number of facts and number of rules (using eval).


If you know the field IDs (which I assume you would, using an EDI  
formant) -  I would see whether you can avoid the field facts and turn  
them into slots (attributes) instead.
But not to the extend you outline in #2, that seems to be too awkward.  
If you have buy another server ;-)



>what did you determine the best approach to be?
I am pretty sure that there is no generic "best approach".
I would proceed trying to understand the nature of the rules (are they  
regular, ie following the same scheme, can they be generated?) and the  
data and create
regression performance test-suite ASAP to monitor performance during  
development.


In our application we for example have analyzed bottlenecks and found  
that (no surprise) accessing properties like facta.factb.value is  
slower compared to facta.facbtValue  - that's in 4.07 - due to the  
different mechanism the value is obtained. From there we have added  
about three of these "shortcut accessors". However we have first  
focused on optimizing the rules themself to avoid excessive matching  
in the first place so there was no need for us to flatten our entire  
model.



my 0.02EUR
 -- Ingomar

Am 28.05.2009 um 02:17 schrieb David Zeigler:


Hi,
I could use some experienced guidance.  I'm in the process of
evaluating Drools for use of using in a real-time transactional
environment to process about 3000 messages/second.  I realize a lot of
this depends on the type and quantity of rules, hardware, etc.  I'm
curious what steps others have taken to improve performance and if
there are any recommendations for my case detailed below.

A few specific questions I have are:
- Should each field in a message be a fact? (more info on my message
below)  What fact granularity have you settled on in your usage and
why?
- Does the order of the conditions in a rule affect performance, the
execution order, or the structure of the Rete network?
- Does the order the facts are inserted into a stateless session (as
a list via the CommandFactory.newInsertElements) affect performance at
all?

The message is an EDI format and will typically have anywhere from 80
to 200 fields, potentially more.  The message is divided into
transactions, then segments, then fields.  We have an object model
that represents the message.  We're using a stateless session and most
of the rules will modify fields or add fields and segments based on
the values of other fields.  Currently, I flatten the object model
into a List containing the message, transactions, segments, and each
field, and then insert the list into the stateless session and fire.
I'm avoiding sequential mode for now until we have a better idea of
our requirements.

Here's a simplified example of what I'm doing now (using json-esque
syntax instead of the EDI format).
message {
 segment {
   SG:header
   A0:agents
 }
 segment {
   SG:agent
   A1:000
   A2:JAMES
   A3:BOND
 }
 segment {
   SG:agent
   A1:86
   A2:MAXWELL
   A3:SMART
 }
}

Each field has an id and a value.  For this message, I would insert 14
facts:  the message object, 3 segments objects, and 10 field objects.

rule "set James Bond's A1 to 007"
when
 $a1 : Field(id == "A1",  value != "007")
 Field(id == "A2",  value == "JAMES")
 Field(id == "A3",  value == "BOND")
then
 $a1.setValue("007");
 update($a1);
end

A relatively more complicated, but typical rule would be "set James
Bond's A1 to 007 iff he's the second agent in the message and one of
the other agents' first names does not contain AUSTIN and an agency
segment exists"

The above example a

[rules-users] fact granularity, performance, and other questions

2009-05-27 Thread David Zeigler
Hi,
I could use some experienced guidance.  I'm in the process of
evaluating Drools for use of using in a real-time transactional
environment to process about 3000 messages/second.  I realize a lot of
this depends on the type and quantity of rules, hardware, etc.  I'm
curious what steps others have taken to improve performance and if
there are any recommendations for my case detailed below.

A few specific questions I have are:
 - Should each field in a message be a fact? (more info on my message
below)  What fact granularity have you settled on in your usage and
why?
 - Does the order of the conditions in a rule affect performance, the
execution order, or the structure of the Rete network?
 - Does the order the facts are inserted into a stateless session (as
a list via the CommandFactory.newInsertElements) affect performance at
all?

The message is an EDI format and will typically have anywhere from 80
to 200 fields, potentially more.  The message is divided into
transactions, then segments, then fields.  We have an object model
that represents the message.  We're using a stateless session and most
of the rules will modify fields or add fields and segments based on
the values of other fields.  Currently, I flatten the object model
into a List containing the message, transactions, segments, and each
field, and then insert the list into the stateless session and fire.
I'm avoiding sequential mode for now until we have a better idea of
our requirements.

Here's a simplified example of what I'm doing now (using json-esque
syntax instead of the EDI format).
message {
  segment {
SG:header
A0:agents
  }
  segment {
SG:agent
A1:000
A2:JAMES
A3:BOND
  }
  segment {
SG:agent
A1:86
A2:MAXWELL
A3:SMART
  }
}

Each field has an id and a value.  For this message, I would insert 14
facts:  the message object, 3 segments objects, and 10 field objects.

rule "set James Bond's A1 to 007"
when
  $a1 : Field(id == "A1",  value != "007")
  Field(id == "A2",  value == "JAMES")
  Field(id == "A3",  value == "BOND")
then
  $a1.setValue("007");
  update($a1);
end

A relatively more complicated, but typical rule would be "set James
Bond's A1 to 007 iff he's the second agent in the message and one of
the other agents' first names does not contain AUSTIN and an agency
segment exists"

The above example assumes each segment and field is a fact and I think
it's a clean and flexible approach, but I'm concerned about the
overhead of inserting potentially 250 facts for each message.  The
only other alternatives I can think of seem to have their own set of
problems:
1. limit the fields that rules can be written against to a limited
subset, which may not be feasible depending on how the requirements
evolve, and only assert those as facts.  Doing this seems to double
the number of transactions per second in my nonscientific benchmark.
2. insert the Message object as a single fact and then write a slew of
accessor methods in that object to get at all possible fields in the
tree: getA1FromSecondAgentSegmentInFirstTransaction().  This seems
like it might perform well, but could be very messy.
3. provide an api in the message object model to find various
occurrences of fields in the messages, then use eval() in the rule.
like eval(msg.findSegment("agent", secondOccurrence).getField("A1")).
I've read that would be less efficient once the ruleset grows.

I'm sure many of you have dealt with this type scenario before, what
did you determine the best approach to be?

Thanks,
David
___
rules-users mailing list
rules-users@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-users