Re: [rules-users] A few general questions on scaling StatefulKnowledgeSessions

2012-08-21 Thread lucas malson
Thread Management... Previous posts were rejected in error...

If the entirety of what you're making a decision on can be expressed in a
handful of facts then it may be reasonable to use a stateless session, as
you will need to insert them for every request.

S
We could only do so if we were to enable a read through much in the way of
hibernate search; however, the entire potential reference dataset per event
is moderate (on the order of 4 GB).
/S



--
View this message in context:
http://drools.46999.n3.nabble.com/A-few-general-questions-on-scaling-StatefulKnowledgeSessions-tp4019226p4019295.html
Sent from the Drools: User forum mailing list archive at Nabble.com.
___
rules-users mailing list
rules-users@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-users


Re: [rules-users] A few general questions on scaling StatefulKnowledgeSessions

2012-08-17 Thread Wolfgang Laun
See inline.

On 17/08/2012, Skiddlebop lmal...@cisco.com wrote:
 Greetings All!

 I humbly request your guidance and insights!

 Overview:
 We are currently undergoing evaluation of how to best proceed using the
 Drools Suite to best meet the current and future business needs with the
 highest system scalability and performance. We are attempting to make the
 proper system design choices, particularly with respect to which of the two
 KnowledgeSession types (stateful or stateless) to use and how to best use
 them to scale the system.

 The Context: We are using Drools for operational decision making,
 monitoring, and workforce resource management; this naturally entails some
 degree of event processing, temporal reasoning, state management, and
 inference.
 Given the nature of this context, it seems that a StatefulKnowledgeSession
 is justified and best (but may not be entirely necessary).

 The current approach:
 Currently, our rule model is not very mature or stable... Consequently, the
 approach is to use one very large long-running StatefulKnowledgeSession
 containing all relevant operational data. This single
 StatefulKnowledgeSession will be constructed and disposed of (and
 reconstructed with operational state) on a very infrequent interval, say
 every 24 hours. In this fashion, a single working memory network manages
 the
 entire operational state and holds all relevant facts; Each fact is updated
 on a per-event basis.

The number of expected events/time unit is an important factor in deciding
about your system's architecture. What do you expect?


 The problem:
 This approach has many drawbacks in my opinion... I'll mention just a
 few...
 StatefulSessions are not thread-safe (require sequential processing)

This is not the same thing: a Stateful Session is thread-safe so that its
methods can be called by more than one thread. But it is correct that
synchronization must guarantee mutual exclusion for all core operations,
resulting in what (I think) you mean by sequential processing.

 and
 consequently will not scale; it is also a single point of failure. Also, as
 the size of Working Memory grows, processing time increases and garbage
 collection becomes very messy and laggy (when performed).

Are these observations or just fears?


 The potential solution:
 To enable greater scalability, differentiation, and parallelization, it
 seems wise to partition the rules/facts into multiple specialized
 concurrently operating StatefulKnowledgeSession (KnowledgeBase) instances.
 However, if done improperly, this poses a difficult problem as with greater
 separation, the more myopic our reasoning becomes.

 Questions (some of which are intentionally dumb):

 Given the nature of the above mentioned business context, does a Stateful
 approach seem justified (or is it advised to follow KISS and remain
 stateless)?

There's no gain in using a stateless session unless you can use it in
sequential mode, which seems very unlikely from your narrative.



 What are the recommended strategies to best scale StatefulKnowledgeSessions
 (or a set of StatefulKnowledgeSessions) as they are inherently
 single-threaded?

You can run multiple sessions in parallel.


 If multiple StatefulKnowledgeSessions/KnowledgeBases are used, what are the
 recommended strategies to partition/individualize/classify them? Should we
 do this according to type, instance value, unique identifier, group of
 interrelated objects, etc? I understand this is very domain/use-case
 specific, but I'm curious how others approach this matter.


You can partition your working memory and knowledge base on anything that
permits processing all by itself. A set of rules investigating integer
numbers might be divided in two sessions: one for even numbers, one
for odd numbers.

Moreover, if processing implies stages, you might consider passing
events from one session to the next.

But see my question w.r.t. the numbers you're expecting.


 What are the recommended strategies with respect to frequency (or triggers)
 of disposal of a StatefulKnowledgeSession and/or the retraction of the
 facts
 therein?


This doesn't make sense in combination with the processing model
you've described.

 What would you say a healthy average working-memory size is?

Define healthy :) And if the *average* size is healthy, it  mean
that it grows into unhealthy dimensions :)

(Is this one of the intentionally dumb questions?)


 What would you say the average lifespan/duration of a
 StatefulKnowledgeSession is?

For the application my company is running, the sessions run as long as
possible, and so the average duration is O(months). - Generally
speaking: make it as long as possible.

(Another one?)



 If we have an external data store (which holds state) from which we can
 query and reconstruct working-memory state for any given object (or set of
 objects), would it be best to continue with the single large
 StatefulKnowledgeSession approach?

Pardon my tartness: only if you 

Re: [rules-users] A few general questions on scaling StatefulKnowledgeSessions

2012-08-17 Thread Stephen Masters
To be honest it sounds to me as though you need to get one of the Red Hat guys 
in to do some proper consultancy for your specific use case.

However...

Aiming to produce a system with the highest scalability and performance 
sounds like a strategy to produce an over engineered, overpriced solution. 
Google process over 30,000 search requests per second from millions of users 
around the world. Is that the level of scalability you need to achieve?

Are you trying to achieve 'web scale'?
http://kirkwylie.blogspot.co.uk/2010/09/cartoon-characters-discuss-web-scale.html

Given a slightly silly example...
Solution A - Processes 1 request per second on one server. Scales perfectly - 
i.e. Processes 2000 requests per second on 2000 servers.
Solution B - Processes 1 request per millisecond on one server. Can't run on 
multiple servers.

… which is the better solution? It depends very much on what your expected load 
is going to be and how much money you want to spend.

The only sensible way to make this decision is to estimate realistic load for 
your system, measure the system performance and optimise based on those 
measurements.


But here are some slightly more practical thoughts from my experience...
Inserting new facts is slow. (although still sub-millisecond)
Evaluating rules is fast.
If the full set of facts required to evaluate your rules is large then you're 
probably better off with a stateless session where you load facts into working 
memory in advance and then fire in small request facts on which you wish to 
make a decision.

If the entirety of what you're making a decision on can be expressed in a 
handful of facts then it may be reasonable to use a stateless session, as you 
will need to insert them for every request.

If you're concerned about the size of the working memory, then I would have to 
assume that you have a large volume of facts to insert and would therefore be 
better off with a stateful session with most of those facts pre-loaded.

If you're truly interested in CEP (especially streaming events), then you need 
stateful sessions.

I hope that's a little bit useful.

Steve




On 17 Aug 2012, at 09:09, Wolfgang Laun wolfgang.l...@gmail.com wrote:

 See inline.
 
 On 17/08/2012, Skiddlebop lmal...@cisco.com wrote:
 Greetings All!
 
 I humbly request your guidance and insights!
 
 Overview:
 We are currently undergoing evaluation of how to best proceed using the
 Drools Suite to best meet the current and future business needs with the
 highest system scalability and performance. We are attempting to make the
 proper system design choices, particularly with respect to which of the two
 KnowledgeSession types (stateful or stateless) to use and how to best use
 them to scale the system.
 
 The Context: We are using Drools for operational decision making,
 monitoring, and workforce resource management; this naturally entails some
 degree of event processing, temporal reasoning, state management, and
 inference.
 Given the nature of this context, it seems that a StatefulKnowledgeSession
 is justified and best (but may not be entirely necessary).
 
 The current approach:
 Currently, our rule model is not very mature or stable... Consequently, the
 approach is to use one very large long-running StatefulKnowledgeSession
 containing all relevant operational data. This single
 StatefulKnowledgeSession will be constructed and disposed of (and
 reconstructed with operational state) on a very infrequent interval, say
 every 24 hours. In this fashion, a single working memory network manages
 the
 entire operational state and holds all relevant facts; Each fact is updated
 on a per-event basis.
 
 The number of expected events/time unit is an important factor in deciding
 about your system's architecture. What do you expect?
 
 
 The problem:
 This approach has many drawbacks in my opinion... I'll mention just a
 few...
 StatefulSessions are not thread-safe (require sequential processing)
 
 This is not the same thing: a Stateful Session is thread-safe so that its
 methods can be called by more than one thread. But it is correct that
 synchronization must guarantee mutual exclusion for all core operations,
 resulting in what (I think) you mean by sequential processing.
 
 and
 consequently will not scale; it is also a single point of failure. Also, as
 the size of Working Memory grows, processing time increases and garbage
 collection becomes very messy and laggy (when performed).
 
 Are these observations or just fears?
 
 
 The potential solution:
 To enable greater scalability, differentiation, and parallelization, it
 seems wise to partition the rules/facts into multiple specialized
 concurrently operating StatefulKnowledgeSession (KnowledgeBase) instances.
 However, if done improperly, this poses a difficult problem as with greater
 separation, the more myopic our reasoning becomes.
 
 Questions (some of which are intentionally dumb):
 
 Given the nature of the above mentioned business 

Re: [rules-users] A few general questions on scaling StatefulKnowledgeSessions

2012-08-17 Thread Wolfgang Laun
On 17/08/2012, Stephen Masters stephen.mast...@me.com wrote:

 But here are some slightly more practical thoughts from my experience...
 Inserting new facts is slow. (although still sub-millisecond)
 Evaluating rules is fast.

Left hand sides of rules are evaluated while new facts are inserted,
so the above distinction does not make sense for me. Perhaps you can
explain what you mean by evaluating rules?

Executing (firing) rules depends on what's done on the right hand
side, so you can't mean that.

-W
___
rules-users mailing list
rules-users@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-users


Re: [rules-users] A few general questions on scaling StatefulKnowledgeSessions

2012-08-17 Thread Stephen Masters
Actually, I do mean that! :D

But maybe I should explain…

To be more precise, most of the time in my apps is taken in marshalling facts 
and inserting them into the session. From firing rules, it tends to take 10s of 
microseconds for a decision to be made.

Obviously if the RHS is doing more than just making a decision based on facts 
already in the system (i.e. the RHS code queries databases, etc) then firing 
can get very slow. However, I tend to follow the best practices that I learned 
from various FICO (!) consultants, who recommended against doing anything heavy 
in the RHS, but rather getting back out of the rules engine ASAP and doing 
those heavy tasks in the invoking application.

This approach works nicely, because the rules engine does what it's good at 
(making decisions based on facts that are in working memory) and my Java 
(Spring) app does what it's good at (getting data and integrating with other 
systems).

The added benefit is that if I need to synchronise access to the session, it's 
not such an issue if each request is back out of the rules engine in 
microseconds.

Steve


On 17 Aug 2012, at 13:01, Wolfgang Laun wolfgang.l...@gmail.com wrote:

 On 17/08/2012, Stephen Masters stephen.mast...@me.com wrote:
 
 But here are some slightly more practical thoughts from my experience...
 Inserting new facts is slow. (although still sub-millisecond)
 Evaluating rules is fast.
 
 Left hand sides of rules are evaluated while new facts are inserted,
 so the above distinction does not make sense for me. Perhaps you can
 explain what you mean by evaluating rules?
 
 Executing (firing) rules depends on what's done on the right hand
 side, so you can't mean that.
 
 -W
 ___
 rules-users mailing list
 rules-users@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/rules-users


___
rules-users mailing list
rules-users@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-users


Re: [rules-users] A few general questions on scaling StatefulKnowledgeSessions

2012-08-17 Thread Vincent LEGENDRE
If you do nothing heavy in the RHS, indeed, rules' action part execution is 
faster than fact insertion, but this is because of your design, not something 
relevant for all usages.

In an inference system using RETE like drools, the most of time is spent to 
update the RETE network. Updates of this network is done at 
insert/retract/modify, and these actions can be called from outside OR inside 
the rules RHS.

In an inference system, you may be interested by chaining, ie your rules' RHS 
do modify the fact base heavily, and thus the RHS exec takes time.

What you describe in your post is almost a sequential behaviour, ie rules exec 
does not modify the fact base. 
I agree that this is a very common usage, but you can't oppose fact insertion 
and RHS execution without the precision of your design choices, which can be 
too restrictive for other usages that require chaining.


- Original Message -
From: Stephen Masters stephen.mast...@me.com
To: Rules Users List rules-users@lists.jboss.org
Sent: Friday, August 17, 2012 3:54:41 PM
Subject: Re: [rules-users] A few general questions on scaling 
StatefulKnowledgeSessions

Actually, I do mean that! :D

But maybe I should explain…

To be more precise, most of the time in my apps is taken in marshalling facts 
and inserting them into the session. From firing rules, it tends to take 10s of 
microseconds for a decision to be made.

Obviously if the RHS is doing more than just making a decision based on facts 
already in the system (i.e. the RHS code queries databases, etc) then firing 
can get very slow. However, I tend to follow the best practices that I learned 
from various FICO (!) consultants, who recommended against doing anything heavy 
in the RHS, but rather getting back out of the rules engine ASAP and doing 
those heavy tasks in the invoking application.

This approach works nicely, because the rules engine does what it's good at 
(making decisions based on facts that are in working memory) and my Java 
(Spring) app does what it's good at (getting data and integrating with other 
systems).

The added benefit is that if I need to synchronise access to the session, it's 
not such an issue if each request is back out of the rules engine in 
microseconds.

Steve


On 17 Aug 2012, at 13:01, Wolfgang Laun wolfgang.l...@gmail.com wrote:

 On 17/08/2012, Stephen Masters stephen.mast...@me.com wrote:
 
 But here are some slightly more practical thoughts from my experience...
 Inserting new facts is slow. (although still sub-millisecond)
 Evaluating rules is fast.
 
 Left hand sides of rules are evaluated while new facts are inserted,
 so the above distinction does not make sense for me. Perhaps you can
 explain what you mean by evaluating rules?
 
 Executing (firing) rules depends on what's done on the right hand
 side, so you can't mean that.
 
 -W
 ___
 rules-users mailing list
 rules-users@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/rules-users


___
rules-users mailing list
rules-users@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-users

___
rules-users mailing list
rules-users@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-users